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(57) Abstract: The invention provides a variety of reagents for use in the diagnosis and management of breast cancer. The invention 
utilizes cDNA microarray technology to identify genes whose expression profile across a large group of tumor samples correlates 
with that of cytokeratin 5 and cytokeratin 17, markers for basal cells of the normal mammary lactation gland. The invention demon- 
strates that tumors that express cytokeratin 5/6 and/or 17 have a poor prognosis relative to tumors overall. The invention provides 
basal marker genes and their expression products and uses of these genes for diagnosis of breast cancer and for identification of 
therapies for breast cancer. In particular, the invention provides basal marker genes including cadherin3, matrix metallo proteinase 
14, and cadherin EGF LAG seven-pass G-lype receptor 2. The invention provides antibodies to the polypeptides expressed by these 
genes and methods of use thereof. 
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BASAL CELL MARKERS IN BREAST CANCER AND USES THEREOF 

GOVERNMENT SUPPORT 
The U.S. Government has a paid-up license in this invention and the right in 
5 limited circumstances to require the patent owner to license others on reasonable 
terms as provided for by the terms of Grant No. NIH CA 77097 awarded by the 
National Cancer Institute. 

CROSS-REFERENCE TO RELATED APPLICATIONS 
10 This application claims priority to provisional application U.S.S.N 

60/220,967, filed July 26, 2000, which is incorporated herein by reference. 

BACKGROUND OF THE INVENTION 
A major challenge of cancer treatment is to target specific therapies to distinct 

1 5 tumor types in order to maximize efficacy and minimize toxicity. A related challenge 
lies in the attempt to provide accurate diagnostic, prognostic, and predictive 
information. At present, tumors are described with the tumor-node-metastasis (TNM) 
system. This system, which uses the size of the tumor, the presence or absence of 
tumor in regional lymph nodes, and the presence or absence of distant metastases, to 

20 assign a stage to the tumor is described in the American Joint Committee on Cancer; 
AJCC Canqer Staging Manual. Philadelphia, Pa: Lippincott-Raven Publishers, 5th ed., 
1997, pp 171-180, and in Harris, JR: "Staging of breast carcinoma" in Harris, J.R., 
Hellman, S., Henderson, I.C., Kinne D.W. (eds.): Breast Diseases. Philadelphia, 
Lippincott, 1991 . The assigned stage is used as a basis for selection of appropriate 

25 therapy and for prognostic purposes. In addition to the TNM parameters, 

morphologic appearance is used to further classify tumors and thereby aid in selection 
of appropriate therapy. However, this approach has serious limitations. Tumors with 
similar histopathologic appearance can exhibit significant variability in terms of 
clinical course and response to therapy. For example, some tumors are rapidly 

30 progressive while others are not. Some tumors respond readily to hormonal therapy 
or chemotherapy while others are resistant. 



WO 02/08765 PCT/USO 1/23843 



Assays for cell surface markers, e.g., using immunohistochemistry, have 
provided means for dividing certain tumor types into subclasses. For example, one 
factor considered in prognosis and in treatment decisions for breast cancer is the 
presence or absence of the estrogen receptor (ER) in tumor samples. ER-positive 
5 breast cancers typically respond much more readily to hormonal therapies such as 
tamoxifen, which acts as an anti-estrogen in breast tissue, than ER-negative tumors. 
Though useful, these analyses only in part predict the clinical behavior of breast 
tumors. There is phenotypic diversity present in breast cancers that current diagnostic 
tools fail to detect. Therefore, there exists a need for improved methods for 

1 0 classifying tumors. 

Mutation or dysregulation of any of a large number of genes contributes to the 
development and progression of cancer as discussed in Hanahan, D. and Weinberg, 
R., The Hallmarks of Cancer, Cell, 100, 57-70, 2000. Genes that play a role in cancer 
can be divided into a number of broad classes including oncogenes, tumor suppressor 

15 genes, and genes that regulate apoptosis. Oncogenes such as ras typically encode 
proteins whose activities promote cell growth and/or division, a function that is 
necessary for normal physiological processes such as development, tissue 
regeneration, and wound healing. However, inappropriate activity or expression of 
oncogenes can lead to the uncontrolled cell proliferation that is a feature of cancer. 

20 Tumor suppressor genes such as Rb act as negative regulators of cell proliferation. 
Loss of their activity, e.g., due to mutations or decreased expression at the level of 
mRNA or protein, can lead to unrestrained cell division. A number of familial cancer 
syndromes and inherited susceptibility to cancer are believed to be caused by 
mutations in tumor suppressor genes. Apoptosis, or programmed cell death, plays 

25 important roles both in normal development and in surveillance to eliminate cells 

whose survival may be deleterious to the organism, e.g., cells that have acquired DNA 
damage. Many chemotherapeutic agents are believed to work by activating the 
endogenous apoptosis pathway in tumor cells. 

Although a substantial number of genes have been implicated as playing 

30 important roles in cancer, the factors responsible for the phenotypic diversity of 
rumors remain largely unknown. In particular, understanding of the underlying . 
differences in gene expression that may contribute to tumor phenotype is limited. 
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Understanding the differences in gene expression between normal and cancerous 
tissue and between different tumors of the same tissue type is of significant 

, diagnostic, prognostic, and therapeutic utility; There is therefore a need for the 
identification of genes exhibiting differential expression between tumors. In 

5 particular, there is a need for the identification of additional genes and proteins that 
can be used to classify tumors, especially genes and proteins that can provide 
diagnostic, prognostic, and/or predictive information in cancer. There is also a need 
for antibodies and other reagents for the detection and measurement of such genes and 
proteins. 

10 Most of the commonly used chemotherapeutic agents act relatively 

nonselective^. Rather than specifically killing tumor cells, these agents target any 
dividing cell, resulting in a variety of adverse effects. In addition, current therapeutic 
strategies are of limited efficacy, and the mortality rate of breast cancer remains high. 
There is therefore a need for the identification of additional genes and proteins that 

15 can be used as targets for the treatment of cancer. There is also a need for antibodies 
and other reagents that can modulate, regulate, or interact with these genes and 
proteins to provide new method of treatment for cancer. 

SUMMARY OF THE INVENTION 

20 The present invention relates to the identification of markers that are useful in 

classifying tumors, particularly breast tumors. The markers identify a class of tumors 
whose cells have characteristics of basal cells of normal breast lactation ducts. The 
markers were identified based on their expression profiles in human breast tumor 
samples, normal breast tissue, and cell lines as assessed using cDNA microarrays. In 

25 particular, the basal cell markers of the present invention were identified based on the 
similarity of their mRNA expression patterns to the expression patterns of markers 
previously known to identify breast duct basal cells, e.g., cytokeratin 5 and 
cytokeratin 1 7, across a set of breast tumor samples. The basal markers include the 
three genes known as cadherin 3 or P-cadherin (SEQ ID NO: 1 ; GenBank protein 

30 accession number NP_00 13 99; GenBank cDNA accession number NMJ)01408), 
matrix metalloproteinase 14 (SEQ ID NO:2; GenBank protein accession number 
NP_004986; GenBank cDNA accession number NMJ)04995); and cadherin EGF 
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LAG seven-pass G-type receptor 2 or EGF-Like Domain, Multiple 2 (SEQ ID NO:3; 
GenBank protein accession number NP_001784; GenBank cDNA accession number 
NMJ)01 793). The invention further provides antibodies that specifically bind to the 
polypeptides encoded by the basal marker genes identified herein. The antibodies 
5 recognize basal cells of normal mammary lactation glands. 

The invention provides various diagnostic methods based on the reagents 
mentioned above. The diagnostic methods include methods for classifying a tumor. 
In particular, the invention allows classification of a breast tumor as belonging to a 
basal class of breast tumors. According to certain of the inventive methods the 

10 presence or amount of a gene product, e.g., a polypeptide or a nucleic acid, encoded 
by a basal marker gene is detected in a sample derived from a subject (e.g., a sample 
of tissue or cells obtained from a tumor or a blood sample obtained from a subject). 
In general the subject is a human, however the subject may also be an animal of any 
other kind. The subject may be an individual who has or may have a tumor. The 

15 sample may be subjected to various processing steps prior to or in the course of 

detection. In certain embodiments of the invention the gene product is a polypeptide 
that is detected using an antibody capable of binding to the polypeptide. In certain 
embodiments of the invention the antibody is used to perform immunohistochemical 
staining on a sample obtained from a subject. In certain embodiments of the invention 

20 basal marker gene mRNA expression is measured using a microarray. In other 

embodiments of the invention basal marker gene mRNA expression is measured by 
. quantitative PCR using a set of primers designed to amplify a portion of the gene. 
Additional detection means that may be employed in the present invention are 
described in U.S. Patent No: 6,057,105. In any of the methods for tumor 

25 classification and diagnosis, it may be advantageous to detect and/or measure 
expression of a set of basal markers rather than expression of a single marker. 

By providing reagents that may reliably be used to classify tumors as 
belonging to a basal subclass, the invention enables a variety of methods for 
improving therapeutic options for patients with breast cancer. Much effort has and 

30 continues to be expended on the discovery of new chemotherapeutic agents. These 
agents are tested for efficacy in clinical trials. In many such trials it is noticed that a 
small number of patients stabilize or improve while receiving the treatment, while 
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most patients do not appear to benefit. Most such agents are not further developed for 
a number of reasons. For example, the clinical trial results may not be adequate to 
, gain approval by the Food and Drug Administration. In addition, a pharmaceutical 
company may determine that the potential market for the drug is too small to justify 
5 further efforts. However, if it were possible to identify those patients likely to 

respond to the treatment, then it would be possible to design clinical trials that would 
show efficacy, and it would be possible to appropriately select patients who would 
benefit from the treatment. In addition, the availability of markers that can be used to 
classify breast tumors enables the retrospective examination of the thousands of breast 

10 tumor samples archived in hospitals and pathology labs. These samples can be 

classified using the inventive reagents and classification scheme, and the results can 
be correlated with the clinical outcome, based on medical records. Thus it is possible 
to determine whether tumors that fall into a particular tumor class, e.g., a basal tumor 
class, are responsive to a particular treatment. This will enable the re-evaluation of 

15 drugs that failed in clinical trials and may identify a subset of tumors that are likely to 
respond to a particular drug, and thus a subset of patients that are likely to benefit 
from treatment with that drug. 

The inventors have recognized that in order to achieve these goals it is 
necessary to develop new and improved methods for classifying breast tumors. The 

20 inventive methods provide a molecular basis for classifying tumors, based on their 
underlying biology. While not wishing to be bound by any theory, the inventors 
postulate that tumors arising from a particular cell type within the breast are likely to 
display common features. Such features may include the prognosis (e.g., predicted 
survival time or likelihood that a patient's life expectancy exceeds a given length of 

25 time) or likelihood that a tumor will respond to a particular therapy. 

In particular, tumors that display characteristics of basal cells of the normal 
breast lactation duct (also referred to herein as breast basal cells) form a distinct 
subclass (referred to herein as the basal subclass). Inventors have confirmed that 
patients with breast tumors whose cells display characteristics of breast basal cells, 

30 e.g., expression of cytokeratin 5 and/or cytokeratin 1 7, have a poor clinical outcome 
relative to patients with breast tumors that do not express these markers. However, 
antibodies to these cytokeratins have been found (by the inventors and by other 
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investigators) to give spotty, focal staining patterns when used to perform 
immunohistochemistry on breast tumor samples. Thus the utility of cytokeratins 5 
i and 17 as markers and the utility of antibodies that bind to cytokeratin 5 or 17 for 

determining whether a tumor is a member of the basal subclass has been limited. The 
5 inventors have therefore identified genes whose mRNA expression profiles across a 
large set of tumor samples correlate with, i.e. are similar to, the expression profiles of 
the known basal cell markers cytokeratins 5 and 17. These genes include the basal 
markers of the present invention mentioned above. As described in Examples 10 and 
13, the inventors have generated antibodies to the proteins expressed by these genes 

10 and shown that the antibodies stain basal cells of normal mammary lactation glands. 
Thus detection of one or more expression products of these genes may be used to 
identify tumors that fall within the basal tumor subclass. 

The invention further provides therapeutic agents based oh the identification of 
breast basal cell markers. The therapeutic agents include compounds that modulate 

15 these genes or that modulate polypeptides encoded by these genes. In particular, the 
therapeutic agents include antibodies that bind to polypeptides encoded by the basal 
cell marker genes. The invention further includes agonists and antagonists to the 
basal marker genes, to the polynucleotides transcribed from those genes, and to their 
encoded polypeptides. The invention also provides methods for identifying such 

20 agonists and antagonists. The invention further includes pharmaceutical compositions 
comprising such antibodies, agonists, and antagonists as well as methods of use of the 
pharmaceutical compositions in the treatment of cancer, particularly breast cancer. 

According to one aspect, the invention provides a method of classifying a 
tumor comprising the steps of (i) providing a tumor sample, (ii) detecting expression 

25 or activity of a gene encoding the polypeptide of SEQ ID NO: 1 in the sample; and (iii) 
classifying the tumor as belonging to a tumor subclass based on the results of the 
detecting step. The invention also provides a method of classifying a tumor 
comprising the steps of (i) providing a tumor sample, (ii) detecting expression or 
activity of a gene encoding the polypeptide of SEQ ID NO:2 in the sample, and (iii) 

30 classifying the tumor as belonging to a tumor subclass based on the results of the 
detecting step. In addition, the invention provides a method of classifying a tumor 
comprising the steps of (i) providing a tumor sample, (ii) detecting expression or 
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activity of a gene encoding the polypeptide of SEQ ID NO:3 in the sample, and (iii) 
classifying the tumor as belonging to a tumor subclass based on the results of the 
detecting step. The invention further includes a method of classifying a tumor 
comprising the steps of (j) providing a tumor sample, (ii) detecting expression or 
5 activity of at least two genes selected from the group consisting of: a gene encoding 
the polypeptide of SEQ ID NO:l, SEQ ID NO:2, and SEQ ID NO:3 in the sample, 
and (iii) classifying the tumor as belonging to a tumor subclass based on the results of 
the detecting step. In any of the foregoing methods the detecting step may comprise 
detecting the polypeptide or polypeptides encoded by the genes. A variety of 

1 0 detection techniques may be employed including, but not limited to, 

immunohistochemical analysis, ELISA assay, antibody arrays, or detecting 
modification of a substrate by the polypeptide. 

In certain embodiments of the methods the tumor is a breast tumor and the 
tumor subclass is a basal tumor subclass. The methods may further comprise 

15 providing diagnostic, prognostic, or predictive informatibn based on the classifying 
step. Classifying may include stratifying the tumor (and thus stratifying a subject 
having the tumor), e.g., for a clinical trial. The methods may further comprise 
selecting a treatment based on the classifying step. 

. In another aspect, the invention provides a method of testing a subject 

20 comprising the steps of (i) providing a sample isolated from a subject, (ii) detecting 
expression or activity of a gene encoding the polypeptide of SEQ ID NO:l in the 
sample, and (iii) providing diagnostic, prognostic, or predictive information based on 
the detecting step. The invention further provides a method of testing a subject 
comprising the steps of (i) providing a sample isolated from a subject, (ii) detecting 

25 expression or activity of a gene encoding the polypeptide of SEQ ID NO:2 in the 
sample (iii) and providing diagnostic, prognostic, or predictive information based on 
the detecting step. The invention further provides a method of testing a subject 
comprising the steps of (i) providing a sample isolated from a subject, (ii) detecting 
expression or activity of a gene encoding the polypeptide of SEQ ID NO:3 in the 

30 sample (iii) and providing diagnostic, prognostic, or predictive information based on 
the detecting step. The invention further includes a method of testing a subject 
comprising the steps of (i) providing a sample isolated from the subject, (ii) detecting 
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expression or activity of at least two genes selected from the group consisting of: a 
gene encoding the polypeptide of SEQ ID NO:l, SEQ ID NO:2, and SEQ ID NO:3 in 
i the sample, and (iii) providing diagnostic, prognostic, or predictive information based 

on the detecting step. In any of these methods the detecting step may comprise 
5 detecting the polypeptide or polypeptides. Detection may be performed using any 
appropriate technique including, but not limited to, immunohistochemistry, ELISA 
assay, protein array, or detecting modification of a substrate by the polypeptide. 

The sample may comprise mRNA, in which case the detecting step may 
comprise hybridizing the mRNA or cDNA or RNA synthesized from the mRNA to a 

10 microarray or detecting mRNA transcribed from the gene or detecting cDNA or RNA 
synthesized from mRNA transcribed from the gene. In any of the above methods, the 
sample may be a blood sample, a urine sample, a serum sample, an ascites sample, a 
saliva sample, a cell, and a portion of tissue. 

In another aspect, the invention provides a kit for diagnosis of a tumor which 

15 may include (i) primers for amplifying an mRNA transcribed from a gene that 

encodes the polypeptide of any of SEQ ID NO: 1 , SEQ ID NO:2; and SEQ ID NO:3 
(ii) instructions for use of the kit; and/or (iii) control samples for testing the primers, 
wherein the control samples comprise nucleic acids that hybridize to the primers. 

In another aspect, the invention provides an antibody that specifically binds to 

20 an epitope found in a polypeptide whose amino acid sequence comprises the amino 
acid sequence of SEQ ID NO:l, and wherein the antibody recognizes basal cells in 
normal mammary lactation glands. According to certain embodiments of the 
invention the antibody distinguishes basal cells from luminal cells in normal 
mammary lactation gland. According to certain embodiments of the invention the 

25 antibody recognizes an epitope found in a peptide having an amino acid sequence 
selected from the group consisting of SEQ ID NO:4, SEQ ID NO:5, and SEQ ID 
NO:6. 

In another aspect, the invention provides an antibody that specifically binds to 
an epitope found in a polypeptide whose amino acid sequence comprises the amino 
30 acid sequence of SEQ ID NO:2, and wherein the antibody recognizes basal cells in 
normal mammary lactation glands. According to certain embodiments of the 
invention the antibody distinguishes basal cells from luminal cells in normal 
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mammary lactation gland. According to certain embodiments of the invention the 
antibody recognizes an epitope found in a peptide having an amino acid sequence * 
selected from the group consisting of SEQ ID NO:7, SEQ ID NO:8, and SEQ ID 
NO:9. 

5 In another aspect, the invention provides an antibody that specifically binds to 

an epitope' found in a polypeptide whose amino acid sequence the amino acid 
sequence of SEQ ID NO:3, and wherein the antibody recognizes basal cells in normal 
mammary lactation glands. According to certain embodiments of the invention the 
antibody distinguishes basal cells from luminal cells in normal mammary lactation 

10 gland. According to certain embodiments of the invention the antibody recognizes an 
epitope found in a peptide having an amino acid sequence selected from the group 
consisting of SEQ ID NO:10, SEQ ID NO:l 1, and SEQ ID NO:12. 

The invention further provides a kit for tumor diagnosis comprising one or 
more of the foregoing antibodies. The kit may further include instructions for use of 

15 the kit and/or a control slide comprising breast tissue samples for testing reagents in 
the kit or such samples themselves. 

According to another aspect, the invention provides a method of testing a 
compound or a combination of compounds for activity against tumors comprising 
steps of (i) obtaining or providing tumor samples taken from subjects who have been 

20 treated with the compound or combination of compounds, wherein the tumors fall 

within a tumor subclass, (ii) comparing the response rate of tumors that fall within the 
tumor subclass and have been treated with the compound with the overall response 
rate of tumors that have been treated with the compound or combination of 
compounds or with the response rate of tumors that do not fall within the subclass and 

25 have been treated with the compound or combination of compounds and (iii) 

identifying the compound or combination of compounds as having selective activity 
against tumors in the tumor subclass if the response rate of tumors in the subclass is 
greater than the overall response rate or the response rate of tumors that do not fall 
within the subclass. In certain embodiments of the invention the tumors are breast 

30 tumors. In certain embodiments of the invention the tumor subclass is a basal tumor 
subclass. The tumors may be classified according to any of the inventive 
classification methods described above. In certain embodiments of the invention the 
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classification is based on expression of the polypeptide of SEQ ID NO:l, 2, 3, or a 
combination of these. 

The invention further provides a method of testing a compound or a 
combination of compounds for activity against tumors comprising steps of (i) treating 
5 subjects in need of treatment for tumors with the compound or combination of 
compounds, (ii) comparing the response rate of tumors that fall within a tumor 
subclass with the overall response rate of tumors or with the response rate of tumors 
that do not fall within the subclass, and (iii) identifying the compound or combination 
of compounds as having selective activity against tumors in the tumor subclass if the 

10 response rate of tumors in the subclass is greater than the overall response rate or the 
response rate of tumors that do not fall within the subclass. The method may further 
comprise various additional steps. For example, the method may comprise steps of (i) 
providing tumor samples from subjects in need of treatment for tumors, (ii) 
determining whether the tumors fall within a tumor subclass, and (iii) stratifying the 

15 subjects based on the results of the determining step prior to performing the treating 
step. The method may further comprise the steps of (i) providing tumor samples from 
subjects in need of treatment for tumors, (ii) detecting expression or activity of a gene 
encoding the polypeptide of SEQ ID NO:l in the samples, and (iii) stratifying the 
subjects based on the results of the detecting step prior to performing the treating step. 

20 The method may further comprise the steps of (i) providing tumor samples from 

subjects in peed of treatment for tumors, (ii) detecting expression or activity of a gene 
encoding the polypeptide of SEQ ID NO:2 in the samples, and (iii) stratifying the 
subjects based on the results of the detecting step prior to performing the treating step. 
The method may further comprise the steps of (i) providing tumor samples from 

25 subjects in need of treatment for tumors, (ii) detecting expression or activity of a gene 
encoding the polypeptide of SEQ ID NO:3 in the samples, and (iii) stratifying the 
subjects based on the results of the detecting step prior to performing the treating step. 
The method may further comprise the steps of (i) providing tumor samples from 
subjects in need of treatment for tumors, (ii) detecting expression or activity of a gene 

30 encoding a polypeptide whose sequence comprises a sequence selected from the group 
consisting of SEQ ID NO:l, SEQ ID NO:2, and SEQ ID NO:3 in the samples, and 
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(iii) stratifying the subjects based on the results of the detecting step prior to 
performing the treating step, 
i In addition, the invention includes a method of testing a compound or a 

combination of compounds for activity against tumors comprising steps of (i) treating 
5 subjects in need of treatment for tumors with the compound or combination of 
compounds or with an alternate compound, wherein the tumors fall within a tumor 
subclass, (ii) comparing the response rate of tumors treated with the compound or 
combination of compounds with the response rate of tumors treated with the alternate 
compound; and (iii) identifying the compound or combination of compounds as 

1 0 having superior activity against tumors in the tumor subclass, as compared with the 
alternate compound, if the response rate of tumors treated with the compound or 
combination of compounds is greater than the response rate of tumors treated with the 
alternate compound. The method may further comprise various additional steps. For 
example, the method may comprise steps of (i) providing tumor samples from 

15 subjects in need of treatment for tumors, (ii) determining whether the tumors fall 
within a tumor subclass, and (iii) stratifying the subjects based on the results of the 
determining step prior to performing the treating step. The method may further 
comprise the steps of (i) providing tumor samples from subjects in need of treatment 
for tumors, (ii) detecting expression or activity of a gene encoding the polypeptide of 

20 SEQ ID NO:l in the samples, and (iii) stratifying the subjects based on the results of 
the detecting step prior to performing the treating step. The method may further 
comprise the steps of (i) providing tumor samples from subjects in need of treatment • 
for tumors, (ii) detecting expression or activity of a gene encoding the polypeptide of 
SEQ ID NO:2 in the samples, and (iii) stratifying the subjects based on the results of 

25 the detecting step prior to performing the treating step. The method may further 

comprise the steps of (i) providing tumor samples from subjects in need of treatment 
for tumors, (ii) detecting expression or activity of a gene encoding the polypeptide of 
SEQ ID NO:3 in the samples, and (iii) stratifying the subjects based on the results of 
the detecting step prior to performing the treating step. The method may further 

30 comprise the steps of (i) providing tumor samples from subjects in need of treatment 
for tumors, (ii) detecting expression or activity of a gene encoding a polypeptide 
whose sequence comprises a sequence selected from the group consisting of SEQ ID 
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NO:l, SEQ ID NO:2, and SEQ ID N0:3 in the samples, and (iii) stratifying the 
subjects based on the results of the detecting step prior to performing the treating step., 

In certain embodiments of the invention the alternate compound is a 
compound approved by the U.S. Food and Drug administration for treatment of 
5 tumors. The invention also provides a method of treating a subject comprising steps 
of (i) identifying a subject as having a tumor in a basal tumor subclass, and (ii) 
administering to the subject a compound identified according to any of the inventive 
methods for identifying a subject. 

#In another aspect, the invention provides a method of treating a subject 
10 comprising steps of (i) providing a subject in need of treatment for cancer, (ii) 
administering to the subject an antibody that specifically binds to a polypeptide 
having an amino acid sequence comprising the sequence of SEQ ID NO: 1, SEQ ID 
NO:2, or SEQ ID NO:3 or administering a combination of such antibodies. In certain 
embodiments of the invention the tumor is a breast tumor. In certain embodiments of 
1 5 the invention the antibody is conjugated with a toxic molecule. 

The invention further provides a method of treating a subject comprising steps 
of (i) providing a subject in need of treatment for cancer, (ii) administering to the 
subject a compound that activates or inhibits a gene that encodes an amino acid 
having a sequence comprising the sequence of SEQ ID NO:l, SEQ ID NO:2, or SEQ 
20 ID NO:3, or that activates or inhibits an expression product of the gene. 

In ajiother aspect, the invention provides a composition comprising two or 
more compounds identified according to any of the methods described above for 
identifying compounds. The invention also provides a pharmaceutical composition 
comprising such a composition and a pharmaceutical^ acceptable carrier. The 
25 invention also provides a composition comprising (i) a compound identified according 
to any of the methods described above for identifying compounds and (ii) a second 
compound, wherein the second compound is approved by the U.S. Food and Drug 
administration for the treatment of cancer or has shown potential efficacy against 
cancer in pre-clinical studies. The invention also provides a pharmaceutical 
30 composition comprising such a composition and a pharmaceutical acceptable 
carrier. 
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The present application refers to various patents, publications, books, articles, 
and other references. The contents of all of these items are hereby incorporated by 
i reference in their entirety. The present application also incorporates by reference six 
U.S. patent applications filed by inventors on July 26, 2001. These applications are 
5 entitled "REAGENTS AND METHODS FOR USE IN MANAGING BREAST 
CANCER", "BSTP-RAS/RERG PROTEIN AND RELATED REAGENTS AND 
METHODS OF USE THEREOF", "BSTP-ECG1 PROTEIN AND RELATED 
REAGENTS AND METHODS OF USE THEREOF", "BSTP-CAD PROTEIN AND 
RELATED REAGENTS AND METHODS OF USE THEREOF", "BSTP-TRANS 
10 PROTEIN AND RELATED REAGENTS AND METHODS OF USE THEREOF", 
"BSTP-5 PROTEINS AND RELATED REAGENTS AND METHODS OF USE 
THEREOF". 

BRIEF DESCRIPTION OF THE DRAWING 
15 Figure 1 A presents the amino acid sequence of the polypeptide encoded by the basal 

marker gene known as cadherin 3 or P-cadherin (SEQ ID NO:l). 

Figure IB presents the amino acid sequence of the polypeptide encoded by the basal 

marker gene known as matrix metalloproteinase 14 (SEQ ID NO:2). 

Figure 1C presents the amino acid sequence of the polypeptide encoded by the basal 
20 marker gene known as cadherin EGF LAG seven-pass G-type receptor 2 or EGF-Like 

Domain, Multiple 2 (SEQ ID NO:3). 

Figure ID presents the amino acid sequences of peptides used to raise antibodies that 
recognize the cadherin 3, matrix metalloproteinase 14, cadherin EGF LAG seven-pass 
G-type receptor 2, and cytokeratin 17 proteins. 
25 Figure 2 shows a comparison of dendrograms representing the results of hierarchical 
clustering of experimental samples using the intrinsic gene set and the epithelial- 
enriched gene set. 

Figure 3 shows breast tissue immunohistochemistry results obtained using various 
antibodies. 

30 Figure 3 A shows tumor Stanford 2-P stained for immunoglobulin light chain. 

Figure 3B shows tumor Stanford 16 stained for the T-lymphocyte cell surface antigen 
CD3. 
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Figure 3C shows normal mammary duct stained for the basal epithelial cell keratins 
5/6. 

Figure 3D shows normal mammary duct stained for the luminal cell keratins 8/18. 
' Figure 3E shows tumor New York 3 stained for keratin 5/6. 
5 Figure 3F shows tumor Stanford 1 6 stained for keratins 8/1 8. 

Figure 4A shows a Western blot demonstrating expression of the cadherin3 
polypeptide in various cell lines. 

Figure 4B shows a Western blot demonstrating expression of the matrix 

metalloproteinase 14 polypeptide in various cell lines. 
10 Figure 4C shows a Western blot demonstrating expression of the cadherin EGF LAG 

seven-pass G-type receptor 2 polypeptide in various cell lines. 

Figure 5A shows a Kaplan-Meier survival curve demonstrating poor outcome in 

cytokeratin 17 and/or cytokeratin 5/6 positive tumors (p - 0.012). 

Figure 5B shows a Kaplan-Meier survival curve demonstrating poor outcome in 
15 cytokeratin 17 and/or cytokeratin 5/6 positive tumors in ' lymph node negative patients 

(p = 0.006). 

Figure 6 shows antibody staining of normal breast tissue cores in a breast tissue array. 
Figure 6A shows staining with anti-cytokeratin 5/6 monoclonal antibody. 
Figure 6B shows staining with anti-cadherin 3 polyclonal antibody. 
20 Figure 6C shows staining with anti-EGF LAG seven-pass G-type receptor 2 
polyclonal antibody. 

Figure 6D shows staining with anti-metallproteinase 14 polyclonal antibody. 

Figure 7 shows antibody staining of breast cancer tissue cores in a breast cancer tissue 

array. 

25 Figure 7A shows antibody staining with anti-cytokeratin 5/6 monoclonal antibody. 
Figure 7B shows antibody staining with anti-EGF LAG seven-pass G-type receptor 2 
polyclonal antibody. 

Figure 7C shows antibody staining with anti-cadherin 3 polyclonal antibody. 
30 BRIEF DESCRIPTION OF THE TABLES 

The tables contain the numerical data corresponding to microarray images. Some 
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tables list the individual genes in the various gene subsets or provide additional 
information. 

Table 1 is a master data table for the 65 microarray experiments performed on 
5 individual tumor samples, in which rows represent I.M.A.G.E. clones that identify 
approximately 1753 genes whose expression varied by at least a factor of 4 and 
columns represent individual microarray experiments. The first 50 pages of the table 
consist of a reference list in which a descriptive name for each clone (where such a 
name exists) appears in the column entitled Name, followed by the Genbank 

10 accession number for the clone. Each row in the reference list contains a number in 
the first column that numerically identifies the column. In the subsequent data portion 
of the table (pages 1 - 392), each row is similarly identified by a number in the first 
column so that the name and Genbank accession number for the clone for which data 
appears in that row may be determined by consulting the reference list. In the data 

15 portion of the table, the column headings in the first row identify the tumor samples. 
Each data cell in the table represents the measured Cy5/Cy3 fluorescence ratio at the 
corresponding target element on the appropriate array. Empty cells indicate 
insufficient or missing data. All ratio values are log transformed (base 2) to treat 
inductions or repressions of identical magnitude as numerically equal but with 

20 opposite sign. 

i 

Table 2 is a master data table for the 19 microarray experiments performed on cell line 
samples, in which rows represent LM.A.G.E. clones that identify approximately 1753 
genes whose expression varied by at least a factor of 4 and columns represent 

25 individual microarray experiments. This table contains only a data portion, in which 
the column headings in the first row identify the cell lines. Each row in the table is 
identified by a number which appears in the first column. The same reference list that 
forms part of Table 1 may be consulted to determine the name and Genbank accession 
number for the clone for which data appears in that row. Each data cell in the table 

30 represents the measured Cy5/Cy3 fluorescence ratio at the corresponding target 
element on the appropriate array. Empty cells indicate insufficient or missing data. 
All ratio values are log transformed (base 2) to treat inductions or repressions of 
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identical magnitude as numerically equal but with opposite sign. 

i Table 3 presents a listing and description of the 1 1 cell lines used to create the 
common reference sample. 

5 

Table 4 presents a complete listing of the 84 experimental samples that were assayed 
versus the common reference sample. The table includes a list of alternate names (in 
the column entitled Sample ID/old name) for the same tumors. The alternate names 
are used to identify the tumor samples in certain contexts, and the table allows 
1 0 conversion between the two sets of names. 

Table 5 lists the tumors used in the experiments described herein, along with clinical 
and pathological information about each tumor/patient. 

15 Table 6 is a master data table for the 84 microarray experiments performed on 

individual tumor, tissue, and cell line samples, in which rows represent I.M.A.G.E. 
clones that identify the 496 genes in the intrinsic gene set, and columns represent 
individual microarray experiments. The first 15 pages of the table consist of a 
reference list in which a descriptive name for each clone (where such a name exists) 

20 appears in the column entitled Name, followed by the Genbank accession number for 
the clone. Each row in the reference list contains a number in the first column that 
numerically identifies the column. In the subsequent data portion of the table (pages 1 
- 91), each row is similarly identified by a number in the first column so that the 
name and Genbank accession number for the clone for which data appears in that row 

25 may be determined by consulting the reference list. In the data portion of the table, 
the column headings in the first row identify the tumor samples. Each data cell in the 
table represents the measured Cy5/Cy3 fluorescence ratio at the corresponding target 
element on the appropriate array. Empty cells indicate insufficient or missing data. 
All ratio values are log transformed (base 2) to treat inductions or repressions of 

30 identical magnitude as numerically equal but with opposite sign. 

Table 7 is a listing of the 374 clones that identify genes selected for the epithelial 
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enriched gene set including Genbank accession numbers. 

Table 8 is a listing of the clones that identify genes that comprise the luminal subset 
including Genbank accession numbers. 

5 

Tables 9-1 and 9-2 are listings of the two groups of clones that identify genes that 
comprise the basal subset including Genbank accession numbers. 

Table 10 is a listing of the clones that identify genes that comprise the ErbB2 subset 
1 0 including Genbank accession numbers. 

Table 1 1 is a listing of the clones that identify genes that comprise the endothelial 
gene subset including Genbank accession numbers. 

1 5 Table 12 is a listing of the clones that identify genes that comprise the 
stromal/fibroblast gene subset including Genbank accession numbers. 

Table 13 is a listing of the clones that identify genes that comprise the B-cell gene 
subset including Genbank accession numbers. 

20 

Table 14 is a listing of the clones that identify genes that comprise the adipose- 
enriched/normal breast gene subset including Genbank accession numbers. 

Table 15 is a listing of the clones that identify genes that comprise the macrophage 
25 gene subset including Genbank accession numbers. 

Table 16 is a listing of the clones that identify genes that comprise the T-cell gene 
subset including Genbank accession numbers. 

30 In Table 1, the Genbank accession number for each clone appears in the column 
entitled "Name", following a brief descriptive name for the gene identified by the 
clone, where available. In some cases the descriptive name is a number corresponding 
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to an I.M.A.GJE. clone ID number. As is well known and accepted in the art, the 

Genbank accession number represents a means of definitively identifying a particular 
, clone, since Genbank accession numbers will be maintained permanently or, if 

changed, the change will be accomplished in such a manner as to allow unambiguous 
5 correlation between any new numbering system and the numbering system currently 

in use. 

Note that Tables 1, 2, and 6 are provided for purposes of presenting the clone 
identifications and the data that was used to perform hierarchical clustering analysis, 
and that the format of the tables may not correspond exactly with the format required 
10 by software developed for the analysis of the data. Appropriate format will, in 
general, depend upon the particular computer program. See, for example, the Web 
site http://genome-www.stanford.edu/-sherlock/tutorial.html for discussion of the 
appropriate format for one particular analysis program. 

15 In Tables 7-16, each entry identifies a clone . The first portion of each entry is a 
brief descriptive name for the gene identified by the clone. The Genbank accession 
number for the clone appears on the last line of the entry for that clone. 



DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS 
20 DEFINITIONS 

To facilitate understanding of the invention, the following definitions are 
provided. It is to be understood that, in general, terms not otherwise defined are to be 
given their meaning or meanings as generally accepted in the art. 

25 

Agonist: As used herein, the term "agonist" refers to a molecule that increases or 
prolongs the duration of the effect of a polypeptide or a nucleic acid. Agonists may 
include proteins, nucleic acids, carbohydrates, lipids, small molecules, ions, or any 
other molecules that modulate the effect of the polypeptide or nucleic acid. An 
30 agonist may be a direct agonist, in which case it is a molecule that exerts its effect by 
binding to the polypeptide or nucleic acid, or an indirect agonist, in which case it 
exerts its effect via a mechanism other than binding to the polypeptide or nucleic acid 
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(e.g., by altering expression or stability of the polypeptide or nucleic acid, by altering 
the expression or activity of a target of the polypeptide or nucleic acid, by interacting 
with an intermediate in a pathway involving the polypeptide or nucleic acid, etc.) 

5 Antagonist: As used herein, the term "antagonist" refers to a molecule that decreases 
or reduces the duration of the effect of a polypeptide or a nucleic acid. Antagonists 
may include proteins, nucleic acids, carbohydrates, or any other molecules that 
modulate the effect of the polypeptide or nucleic acid. An antagonist may be a direct 
antagonist, in which case it is a molecule that exerts its effect by binding to the 

10 polypeptide or nucleic acid, or an indirect antagonist, in which case it exerts its effect 
via a mechanism other than binding to the polypeptide or nucleic acid (e.g., by 
altering expression or stability of the polypeptide or nucleic acid, by altering the 
expression or activity of a target of the polypeptide or nucleic acid, by interacting with 
an intermediate in a pathway involving the polypeptide or nucleic acid, etc.) 

15 , 

Basal cell: The term "basal cell" is a general term applied to any stratified or 
pseudostratified epithelium. It refers to cells which are juxtaposed to the basement 
membrane and under one or more additional epithelial layers. Mammary tissue can 
have both a two cell layer epithelium (basal and luminal cells) or in the duct system, a 

20 single layered epithelium. In the two cell layer, the cells adjacent to the basement 

membrane pre termed "basal cells" and express basal cell markers (e.g., cytokeratin 17 
and cytokeratin 5/6). In pseudostratified epitheum "non-basal" cells can also contact 
the basement membrane but since normal breast epithelium is not, in general, 
pseudostratified, breast basal cells are cells located adjacent to basement membrane 

25 and under one or more additional layers of epithelial cells. As used herein, the term 
"basal cell" is distinct from "myoepithelial cell" in that myoepithelial cell refers to 
cells that have the contractual apparatus for milk excretion by the ducts (i.e., they 
express contractile proteins). 

30 

Breast basal cell marker: A gene whose expression is characteristic of basal cells of 
normal breast lactation ducts, or an expression product of such a gene (e.g., an mRNA 
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or polypeptide); The marker may be used to distinguish basal cells from other cells in 
the breast, e.g., luminal cells. In the case of a marker that is a polypeptide, antibodies 
to the polypeptide stain cells in the basal layer of normal breast lactation ducts when 
used to perform immunohistochemistry on breast tissue samples. Since the present 
5 invention is concerned primarily with breast cancer, the term "basal cell marker" is 
used interchangeably with "breast basal cell marker" herein unless otherwise 
indicated. Examples of basal cell markers include the cytokeratin 5 and cytokeratin 
17 genes, mRNAs, and proteins, in addition to the newly identified basal cell markers 
described herein. 

10 

Breast basal tumor marker: A gene whose expression is characteristic of basal cells in 
the normal breast lactation duct and which is also expressed in a subset of breast 
tumors, or an expression product of such a gene. These genes include cytokeratin 5 
and cytokeratin 17, which are known from the prior art to distinguish breast basal 
15 cells from other breast tissue cells, and ,the genes identified herein. Antibodies to the 
proteins encoded by these genes identify basal breast cells when used to perform 
immunohistochemical staining of normal breast tissue, i.e., they stain cells in the basal 
epithelial layer. The term "basal tumor marker" is used interchangeably with "breast 
basal tumor marker" herein unless otherwise indicated. 

20 

Breast basql tumor subclass: The breast basal tumor subclass, as used herein, refers to 
breast tumors that display characteristics of basal cells of normal breast lactation 
ducts. Such characteristics include expression of genes whose expression has been 
shown to discriminate between normal basal cells of breast lactation ducts and other 

25 cells in the breast, including luminal cells of breast lactation ducts. These genes 
include cytokeratin 5 and cytokeratin 17, which are known from the prior art to 
distinguish breast basal cells from other breast tissue cells, and the genes identified 
herein. Antibodies to the proteins encoded by these genes identify basal breast cells 
when used to perform immunohistochemical staining of normal breast tissue, i.e., they 

30 stain cells in the basal epithelial layer. The term "breast basal tumor subclass" is used 
interchangeably with "basal tumor subclass" herein unless otherwise indicated. 



20 




WO 02/08765 PCT7US0 1/23843 



Diagnostic information: As used herein, diagnostic information or information for 
use in diagnosis is any information that is useful in determining whether a patient has 
i a disease or condition and/or in classifying the disease or condition into a phenotypic 
category or any category having significance with regards to the prognosis of or likely 
5 response to treatment (either treatment in general or any particular treatment) of the 
disease or condition. Similarly, diagnosis refers to providing any type of diagnostic 
information, including, but not limited to, whether a subject is likely to have a 
condition (such as a tumor), information related to the nature or classification of a 
tumor, information related to prognosis and/or information useful in selecting an 
10 appropriate treatment. Selection of treatment may include the choice of a particular 
chemotherapeutic agent or other treatment modality such as surgery, radiation, etc., a 
choice about whether to withhold or deliver therapy, etc. 

Differential expression: A gene exhibits differentia] expression at the RNA level if its 
15 RNA transcript varies in abundance between different samples in a sample set. A 
gene exhibits differential expression at the protein level, if a polypeptide encoded by 
the gene varies in abundance between different samples in a sample set. In the 
context of a microarray experiment, differential expression generally refers to 
differential expression at the RNA level. 

20 

Gene: For the purposes of the present invention, the term "gene" has its meaning as 
understood in the art. However, it will be appreciated by those of ordinary skill in the 
art that the term "gene" has a variety of meanings in the art, some of which include 
gene regulatory sequences (e.g., promoters, enhancers, etc.) and/or intron sequences, 

25 and others of which are limited to coding sequences. It will further be appreciated 
that definitions of "gene" include references to nucleic acids that do not encode 
proteins but rather encode functional RNA molecules such as tRNAs. For the purpose 
of clarity we note that, as used in the present application, the term "gene" generally 
refers to a portion of a nucleic acid that encodes a protein; the term may optionally 

30 encompass regulatory sequences. This definition is not intended to exclude 

application of the term "gene" to non-protein coding expression units but rather to 
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i 

clarify that, in niost cases, the term as used in this document refers to a protein coding 
nucleic acid. 

Gene product or expression product: A gene product or expression product is, in 
5 general, an RNA transcribed from the gene or a polypeptide encoded by an RNA 
transcribed from the gene. 

Marker: A marker, as used herein, refers to a gene whose expression is characteristic 
of a particular cell type. The term may also refer to a product of gene expression, e.g;, 

10 an RNA transcribed from the gene or a translation product of such an RNA, the 
production of which is characteristic of a particular cell type. The cell type may be 
defined based on any phenotypic criterion. For example, a normal breast basal cell is 
defined based on its position within an epithelial layer. In some cases expression of a 
marker gene may be the sole criterion used to define the cell type. The statistical 

15 significance of the presence or absence of a marker gene expression product may vary 
depending upon the particular marker. In some cases the detection of a marker is 
highly specific in that it reflects a high probability that the cell is of a particular type. 
This specificity may come at the cost of sensitivity, i.e., a negative result may occur 
even if the cell is a cell that would be expected to express the marker. Conversely, 

20 markers with a high degree of sensitivity may be less specific than those with lower 
sensitivity. Thus it will be appreciated that a useful marker need not distinguish cells 
of a particular type with 100% accuracy. Furthemore, it will be appreciated that the 
use of multiple markers may improve the specificity and/or sensitivity with which a 
cell can be identified as being of a particular cell type. The concept of a marker may 

25 be applied not only to individual cells, but also to tumors or to other disease states. In 
the case of tumors, a marker for a particular tumor class is a gene whose expression is 
characteristic of a particular tumor type, i.e., a gene whose expression is characteristic 
of some or all of the cells in the tumor. The term may also refer to a product of gene 
expression, e.g., an RNA transcribed from the gene or a translation product of such an 

30 RNA, the production of which is characteristic of a particular tumor type, i.e., of some 
or all of the cells in the tumor. 
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Prognostic information and predictive information: As used herein the terms 
prognostic information and predictive information are used interchangeably to refer to 
any information that may be used to foretell any aspect of the course of a disease or 
condition either in the absence or presence of treatment. Such information may 
5 include, but is not limited to, the average life expectancy of a patient, the likelihood 
that a patient will survive for a given amount of time (e.g., 6 months, 1 year, 5 years, 
etc.), the likelihood that a patient will be cured of a disease, the likelihood that a 
patient's disease will respond to a particular therapy (wherein response may be 
defined in any of a variety of ways). Prognostic and predictive information are 
10 included within the broad category of diagnostic information. 

Response: As used herein a response to treatment may refer to any beneficial 
alteration in a subject's condition that occurs as a result of treatment. Such alteration 
may include stabilization of the condition (e.g., prevention of deterioration that would 

1 5 have taken place in the absence of the treatment), amelioration of symptoms of the 
condition, improvement in the prospects for cure of the condition, etc. One may refer 
to a subject's response or to a tumor's response. In general these concepts are used 
interchangeably herein. Tumor or subject response may be measured according to a 
wide variety of criteria, including clinical criteria and objective criteria. Techniques 

20 for assessing response include, but are not limited to, clinical examination, chest X- 
ray, CT scan, MRI, ultrasound, endoscopy, laparoscope presence or level of tumor 
markers in a sample obtained from a subject, cytology, histology. Many of these 
techniques attempt to determine the size of a tumor or otherwise determine the total 
tumor burden. Methods and guidelines for assessing response to treatment are 

25 discussed in Therasse P., et aL, "New guidelines to evaluate the response to treatment 
in solid tumors", European Organization for Research and Treatment of Cancer, 
National Cancer Institute of the United States, National Cancer Institute of Canada. J 
Natl Cancer Inst, Feb 2;92(3):205-16, 2000. The exact response criterion can be 
selected in any appropriate manner, provided that when comparing groups of tumors 

30 and/or patients, the groups to be compared are assessed based on the same or 

comparable criteria for determining response rate. One of ordinary skill in the art will 
be able to select appropriate criteria. 
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Sample: As used herein, a sample obtained from a subject may include, but is not 
limited to, any or all of the following: a cell or cells, a portion of tissue, blood, serum, 
ascites, urine, saliva, and other body fluids, secretions, or excretions. The term 
"sample" also includes any material derived by processing such a sample. Derived 
samples may include nucleic acids or proteins extracted from the sample or obtained 
by subjecting the sample to techniques such as amplification or reverse transcription 
of mRNA, etc. 

Specific binding: As used herein, the term refers to an interaction between a target 
polypeptide (or, more generally, a target molecule) and a binding molecule such as an 
antibody, agonist, or antagonist. The interaction is typically dependent upon the 
presence of a particular structural feature of the target polypeptide such as an 
antigenic determinant or epitope recognized by the binding molecule. For example, if 
an antibody is specific for epitope A, the presence of a polypeptide containing epitope 
A or the presence of free unlabeled A in a reaction containing both free labeled A and 
the antibody thereto, will reduce the amount of labeled A that binds to the antibody. . 
It is to be understood that specificity need not be absolute. For example, it is well 
known in the art that numerous antibodies cross-react with other epitopes in addition 
to those present in the target molecule. Such cross-reactivity may be acceptable 
depending upon the application for which the antibody is to be used. One of ordinary 
skill in the art will be able to select antibodies having a sufficient degree of specificity 
to perform appropriately in any given application (e.g., for detection of a target 
molecule, for therapeutic purposes, etc). It is also to be understood that specificity 
may be evaluated in the context of additional factors such as the affinity of the 
binding molecule for the target polypeptide versus the affinity of the binding molecule 
for other targets, e.g., competitors. If a binding molecule exhibits a high affinity for a 
target molecule that it is desired to detect and low affinity for nontarget molecules, the 
antibody will likely be an acceptable reagent for immunodiagnostic purposes. Once 
the specificity of a binding molecule is established in one or more contexts, it may be 
employed in other, preferably similar, contexts without necessarily re-evaluating its 
specificity. 
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Treating a tumor: As used herein, treating a tumor is taken to mean treating a subject 
who has the tumor. 

5 Tumor sample: The term "tumor sample" as used herein is taken broadly to include 
cell or tissue samples removed from a tumor, cells (or their progeny) derived from a 
tumor that may be located elsewhere in the body (e.g., cells in the bloodstream or at a 
site of metastasis), or any material derived by processing such a sample. Derived 
tumor samples may include nucleic acids or proteins extracted from the sample or 
1 0 obtained by subjecting the sample to techniques such as amplification or reverse 
transcription of mRNA, etc. 

Tumor subclass: A tumor subclass, also referred to herein as a tumor subset or tumor 
class, is the group of tumors that display one or more phenotypic or genotypic 
15 characteristics that distinguish members of the group frbm other tumors. 

I. Overview and Description of the Basal Marker Genes, Polynucleotides, and 
Polypeptides 

The present invention provides new reagents and methods for the management 
20 (e.g., detection, classification, provision of diagnostic and prognostic information, 
treatment, etc.) of breast cancer. Significant progress has been made in understanding 
risk factors, including genetic factors, that may contribute to breast cancer (See, for 
example, Vogelstein, B. and Kinzler, eds., "Breast Cancer", by Couch, F. and Weber, 
B. in The Genetic Basis of Human Cancer, McGraw Hill, 1998), but the relevance of 
25 these factors to clinical outcome remains unclear. The most powerful prognosticated 
are clinical features such as lymph node status, tumor size, and tumor grade. In 
addition, the expression level and antibody staining pattern of several proteins are 
predictive of outcome and of the likelihood of response to therapy. However, the 
clinical outcome of individual patients remains uncertain. In addition, the ability to 
30 predict which patients are likely to benefit from a particular type of therapy (e.g., a 
certain drug or class of drug) remains elusive. 
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The invention encompasses the realization that high throughput analysis 
techniques, e.g., those involving the use of cDNA microarrays, can be used to provide 
, new insights into the biology of breast cancer. By analyzing the transcriptional 

profiles of a large number of breast tumor samples and by undertaking comparisons, 
5 e.g., between tumors associated with varying prognoses, between primary tumors and 
metastases, between tumors before and after treatment, and between tumors with 
differing responses to therapy, the present invention provides new tools and methods 
for classifying tumors and defines new classes of tumors based on these methods. 
The invention identifies genes and gene subsets that are useful in classifying breast 

10 tumors. In addition, the methods described herein identify genes that are likely to 
play a role in breast cancer development, progression, and/or response to therapy. 
Classification based on expression of particular genes may be used to predict clinical 
course or to predict sensitivity to chemotherapeutic agents. Ultimately such 
classification may be used to guide selection of appropriate therapy. As described 

1 5 herein, detection of mRNA and protein corresponding to differentially expressed 
genes provides new methods of use in cancer prognosis, diagnosis, and treatment 
selection. In addition, differentially expressed genes and their encoded proteins 
provide targets for the identification of new therapies for breast cancer. 

As described in further detail below, the invention employs methods for 

20 clustering genes into groups by determining their expression patterns across a set of 
samples obtained from breast tumors and from normal breast tissue. The invention 
also clusters the breast tumor and normal breast tissue samples into groups based on 
similarities in their expression of a set of genes. This two-dimensional clustering 
. approach permits the association of particular classes of tumors with particular subsets 

25 of genes that, for example, show relatively high levels of expression in the tumors. 
Correlation with clinical information indicates that the tumor classes have clinical 
significance in terms of prognosis or response to chemotherapy. 

Genes that are relatively overexpressed in tumors may be particularly 
appropriate targets for the development of new therapeutic agents. Any gene (or 

30 combination of genes) that is overexpressed in some tumors forms a basis by which 
tumors can be divided into different groups. As demonstrated herein, when particular 
sets of genes are used such groups have clinical significance in that, for example, they 



26 



WO 02/08765 PCT/US01/23843 



display differences in prognosis. However, regardless of whether the resulting 
division has significance in terms of known clinical parameters, therapeutic agents 
directed towards such genes or towards their encoded proteins would be expected to 
be specific for the tumors that overexpress the genes. Thus the invention offers an 
5 opportunity for the development and selection of therapeutic agents based on specific 
properties of a tumor. In other words, any gene that is overexpressed in a subset of 
tumors can be used to define that subclass and is a potential target for the 
development of a therapeutic agent that is specific for that tumor subclass. 

In particular, tumors that display characteristics of basal cells of the normal 

1 0 breast lactation gland (also referred to herein as breast basal cells) form a distinct 
subclass (referred to herein as the basal subclass). It is known in the art that two 
distinct types of epithelial cells are found in the adult human mammary gland: basal 
cells and luminal epithelial cells. Expression of cytokeratin 5 and/or cytokeratin 17 is 
a characteristic of basal cells of the normal mammary lactation gland, while 

15 cytokeratins 8 and 18 are expressed in luminal cells. Cytokeratins are a family of 
intermediate filament proteins* members of which are found in most or all epithelial 
cell types (Moll, R., et al, "The catalog of human cytokeratins: patterns of expression 
in normal epithelia, tumors, and cultured cells", Cell, 31(1), 1 1-24, 1982. 
Intermediate-sized filaments are morphologically similar but biochemically and 

20 immunologically distinguishable cytoplasmic proteins of which five major filament 
types have been identified (cytokeratin, vimentin, desmin, neurofilament protein, glia 
filament protein), and antibodies to these proteins have been used for distinguishing 
different cell types and tumors derived therefrom. Epithelial and carcinoma cells are 
characterized by the presence of cytokeratin filaments that can be identified by 

25 antibodies. These antibodies can be used to distinguish between different cell and 
tumor types (Dobus, E,, et al, "Immunohistochemical distinction of human 
carcinomas by cytokeratin typing with monoclonal antibodies", Am 1 Pathol, 1 14(1): 
121-30, 1984). In particular, antibodies against cytokeratins 5/6, 17, 8, and 18 may be 
used to distinguish between breast basal and luminal cell types in normal breast and in 

30 tumors (See, e.g., Purkis, P., et al 9 "Antibody markers of basal cells in complex 
epithelia", J. Clin. Pathol, 48:26-32, 1990; Taylor,-Papadimitriou and Lane, E., 
"Keratin expression in the mammary gland" in Neville, M and Daniel C, eds. The 
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Mammary Gland: Development, Regulation, and Function, New York: Plenum, pp. 
181-215, 1987; Dairkee, S., et al, "Immunolocalization of a human basal epithelium- 
specific keratin in benign and malignant breast disease. Breast Cancer Res. Treat, 
10:11-20,1987.) 

5 Several previous studies suggested that expression of basal cell keratins is 

associated with a poor clinical outcome (Dairkee, S.H., et aL 9 "Monoclonal antibody 
that predicts early recurrence of breast cancer", Lancet, 1:514, 1987; Malzahn, K., et 
al, "Biological and prognostic significance of stratified epithelial cytokeratins in 
infiltrating ductal breast carcinomas", Virchows Archiy, 433:1 19-29, 1998). Inventors 

1 0 have confirmed, in a large-scale study, that patients with breast tumors whose cells 
display characteristics of breast basal cells, e.g., expression of cytokeratin 5 and/or • 
cytokeratin 17, have a poor clinical outcome relative to patients with breast tumors 
that do not express these markers. However, antibodies to these cytokeratins have 
been found (by the inventors and by other investigators) to give spotty, focal staining 

15 patterns when used to perform immunohistochemistry oil breast tumor samples. Thus 
the utility of cytokeratins 5 and 17 as markers and the utility of antibodies that bind to 
cytokeratin 5 or 1 7 for determining whether a tumor is a member of the basal subclass 
has been limited. 

The inventors have therefore identified genes whose mRNA expression profiles 
20 across a large set of tumor samples correlate with, i.e., are similar to, the expression 
profiles of the known basal cell markers cytokeratins 5 and 17. These genes include 
the basal marker genes of the present invention, i.e., genes that encode cadherin3 or P- 
cadherin (SEQ ID NO:l ; GenBank protein accession number NP_001399; GenBank 
cDNA accession number NM_001408), matrix metalloproteinase 14 (SEQ ID NO:2; 
25 GenBank protein accession number NP_004986; GenBank cDNA accession number 
NM_004995); and cadherin EGF LAG seven-pass G-type receptor 2 or EGF-Like 
Domain, Multiple 2 (SEQ ID NO:3; GenBank protein accession number NP_001 784; 
GenBank cDNA accession number NM_001793). A portion of the cadherin3 gene 
was present as I.M.A.G.E. clone 777301 on the cDNA microarray described below. 
30 This clone is entry #421 in Table 1 . A portion of the matrix metalloproteinase 14 
gene was present as I.M.A.G.E. clone 270505 on the cDNA microarray described 
below. This clone is entry #424 in Table 1. A portion of the cadherin EGF LAG 
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seven-pass G-type receptor 2 gene was present as I.M.A.G.E. clone 175103 on the 
cDNA microarray described below. This clone is entry #1443 in Table 1 . 

, Information about these genes may be found at NCBI's LocusLink 

(http://www.ncbi.nlni.nih.gov/LocusLink\ among other sources. As described in 

5 Examples 10 and 13, the inventors have generated antibodies to the proteins expressed 
by these genes and shown that the antibodies stain basal cells of normal mammary 
lactation glands. Thus detection of one or more expression products of these genes 
may be used to identify tumors that fall within the basal tumor subclass. 

As is well known in the art, breast carcinomas lose the typical histology and 
10 architecture of normal breast glands. Generally, carcinoma cells overgrow the normal 
cells and lose their ability to differentiate into glandular like structures. The degree of 
loss of differentiation in general is related to the aggressiveness of the tumor. For 
example, "in situ" carcinoma by definition retains the basement membrane intact, 
whereas as it progresses to "invasive", the tumor shows breakout of basement 
15 membranes. Thus one would not expect to see, within breast carcinomas, staining of 
a discrete layer of basal cells as seen in normal breast tissue. For a discussion of the 
physiology and histology of normal breast and breast carcinoma, see Ronnov-Jessen, 
L., Petersen, O. W. & Bissell, M. J. Cellular changes involved in conversion of 
normal to malignant breast: importance of the stromal reaction. Physiol Rev 76, 69- 
20 125 (1996). 

The basal marker genes provided herein are expressed in the best model of 
basal cells (HMECs, Human Mammary Epithelial Cells) and based on antibody 
staining, in normal breast basal cells. Therefore describing them as basal markers is 

25 appropriate. However, in addition to their specific staining properties, a major 
characteristic that makes these genes and their expression products useful is their 
variation in expression across cohorts of breast carcinoma 
patients, which portends their utility in stratification of breast carcinoma patients. 
While not wanting to be limited by the implications of having chosen a particular 

30 descriptor (i.e. "basal") inventors refer to the set of genes, proteins, and antibody 
reactivity patterns as "basal" as it serves as a reminder of their utility in recognizing 
breast tumor cells that have characteristics 
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reminiscent of normal breast basal cells. Breast tumors containing such cells are 
likewise referred to as "basal" without intending any limitations thereby. 

Two of the basal marker genes, cadherin3 and cadherin EGF LAG seven-pass 
G-type receptor 2 encode members of the cadherin superfamily. The cadherin EGF 
5 LAG seven-pass G-type receptor 2 or EGF-Like Domain, Multiple 2 protein is a 
member of the flamingo subfamily, part of the cadherin superfamily. The cadherins 
are a large family of proteins with critical roles in the regulation of cell-cell adhesion. 
Generally expressed in development- or tissue-specific manners, these factors have 
been shown to have important roles in development, cellular proliferation, and 

10 differentiation. The cadherin superfamily include classic cadherins, desmogleins, 
desmocollins, protocadherins, CNRs, Fats, and seven-pass transmembrane cadherins 
(for review see Nollet et ah 2000). Typically transmembrane proteins, the cadherins 
are characterized by the unique cadherin, or EC, domain. These cadherin domains, 
which are involved in Ca binding (Takeichi 1990), are repeated in the extracellular 

15 region of all of the family members. The amino acid sequences of other regions shows 
significant divergence among members, suggesting functional diversity amongst the 
various cadherin proteins. However, amid the members of each subfamily, the 
cytoplasmic domains are conserved. In the classic cadherins, which are components of 
adherens junctions and desmoplakin plaques, this region interacts with catenin p^O 0 *, 

20 and plakoglobin or 0-catenin. The latter binds to a-catenin, and this molecular 
complex farther associates with a-actinin, F-actin and other cytoskeletal proteins. 
Consistent with their roles in regulating cell-cell adhesion events, altered expression 
of cadherin genes has been associated with human cancer. Alteration of cadherin 
function may lead to subsequent metastasis by disaggregation of tumor cells, and one 

25 proposed role of many cadherins studied to date is as tumor- and invasion- 
suppressors. Further discussion of some of the many members of the cadherin 
superfamily and their possible role in cancer is found in references 53-61. 

The flamingo subfamily consists of nonclassic-type cadherins; a subpopulation 
that does not interact with catenins. The flamingo cadherins are located at the plasma 

30 membrane and have nine cadherin domains, seven epidermal growth factor-like 

repeats and two laminin A G-type repeats in their ectodomain. They also have seven 
transmembrane domains, a characteristic unique to this subfamily. While not wishing 
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to be bound by any theory, it is postulated that these proteins are receptors involved in 
contact-mediated communication, with cadherin domains acting as homophilic 
♦ binding regions and the EGF-like domains involved in cell adhesion and receptor- 

ligand interactions. The cadherin EGF LAG seven-pass G-type receptor 2 gene (also 
5 known as CELSR2) has not been as extensively studied as the classic cadherins, but is 
implicated in cell signaling. The Drosophila homolog of this gene has been studied in 
more detail, and is clearly important in regulating different cellular events (Usui T, 
Shima Y, Shimada Y, Hirano S, Burgess RW, Schwarz TL, Takeichi M, Uemura T, 
"Flamingo, a seven-pass transmembrane cadherin, regulates planar cell polarity under 

10 the control of Frizzled", Cell 1999 Sep 98:585-95. 

While not wishing to be bound by any theory, it is postulated that this protein is a 
receptor involved in contact-mediated communication, with the cadherin domains 
acting as homophilic binding regions and the EGF-like domains involved in cell 
adhesion and receptor-ligand interactions. 

1 5 Proteins of the matrix metalloproteinase (MMP) family are involved in the 

breakdown of extracellular matrix in normal physiological processes, such as 
embryonic development, reproduction, and tissue remodeling, as well as in disease 
processes, such as arthritis and metastasis. Most MMFs are secreted as inactive 
proproteins which are activated when cleaved by extracellular proteinases. However, 

20 matrix metalloproteinase 1 4 protein is a member of the 

membrane-type MMP (MT-MMP) subfamily; each member of this subfamily 
contains 

a potential transmembrane domain suggesting that these proteins are expressed at the 
cell surface rather than secreted. This protein activates MMP2 protein, and this 

25 activity may be involved in tumor invasion. 

Cadherin3 is predicted to be membrane-bound, with an extracellular portion. 
As indicated by the presence of seven putative transmembrane domains, cadherin 
EGF LAG seven-pass G-type receptor 2 is also likely to be a membrane bound 
protein. The presence of a predicted transmembrane domain indicates that matrix 

30 metalloproteinase 14 is also membrane bound. The likelihood that the proteins 
encoded by the basal marker genes are membrane bound makes them attractive 
candidate for the application of serological assays for diagnostic purposes. In addition, 
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the likelihood that cadherin3, cadherin EGF LAG seven-pass G-type receptor 2, and 
matrix metalloproteinase 14 are membrane bound makes them attractive candidates 
for antibody therapeutics. 

The invention provides antibodies that specifically bind to the polypeptide 
5 expression products of the basal marker genes, i.e., the polypeptides of SEQ ID NO:l, 
2, and 3. The antibodies stain basal cells of the normal mammary lactation gland. In 
certain embodiments of the invention the antibodies distinguish basal cells from 
luminal cells in normal mammary lactation glands. 

The antibodies are potentially useful as therapeutic reagents for cancer, 

1 0 particularly breast cancer, either by themselves or when conjugated to or delivered 
with another molecule such as a toxic compound. The invention further provides 
pharmaceutical compositions comprising agonists or antagonists of the 
polynucleotides and their encoded polypeptides, and methods of use thereof for the 
treatment of cancer. The invention includes a variety of methods for providing 

1 5 information of use in the prognosis, classification, diagnosis, etc. of cancer, 
particularly breast cancer. 

In order that the manner in which the basal cell marker genes of the present 
invention were identified may be better understood, a description of cDNA microarray 
technology is provided below. Following this description the specific experimental 

20 approach employed herein is described. Certain aspects of the invention are then 
described Hi further detail. 

II. cDNA Microarray Technology 

cDNA microarrays consist of multiple (usually thousands) of different cDNAs 

25 spotted (usually using a robotic spotting device) onto known locations on a solid 

support, such as a glass microscope slide. The cDNAs are typically obtained by PCR 
amplification of plasmid library inserts using primers complementary to the vector 
backbone portion of the plasmid or to the gene itself for genes where sequence is 
known. PCR products suitable for production of microarrays are typically between 

30 0.5 and 2.5 kB in length. Full length cDNAs, expressed sequence tags (ESTs), or 
randomly chosen cDNAs from any library of interest can be chosen. ESTs are 
partially sequenced cDNAs as described, for example, in L. Hillier, et al., Generation 
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and analysis of 280,000 human expressed sequence tags, Genome Research, 6, 
807-828, 1996. The afore-mentioned article is herein incorporated by reference, as 

i are the entire teachings of all other patents and journal articles mentioned herein, for 

all purposes and not just those related to the particular context in which they are 
5 mentioned. Although some ESTs correspond to known genes, frequently very little or 
no information regarding any particular EST is available except for a small amount of 
3' and/or 5' sequence and, possibly, the tissue of origin of the mRNA from which the 
EST was derived. As will be appreciated by one of ordinary skill in the art, in general 
the cDNAs contain sufficient sequence information to uniquely identify a gene within 

10 the human genome. Furthermore, in general the cDNAs are of sufficient length to 

hybridize, preferably specifically and yet more preferably uniquely, to cDNA obtained 
from mRNA derived from a single gene under the hybridization conditions of the 
experiment 

In a typical microarray experiment, a microarray is hybridized with 

15 differentially labeled RNA or DNA populations derived from two different samples. 
Most commonly RNA (either total RNA or poly A + RNA) is isolated from cells or 
tissues of interest and is reverse transcribed to yield cDNA. Labeling is usually 
performed during reverse transcription by incorporating a labeled nucleotide in the 
reaction mixture. Although various labels can be used, most commonly the 

20 nucleotide is conjugated with the fluorescent dyes Cy3 or Cy5. For example, Cy5- 
dUTP and Cy3-dUTP can be used. cDNA derived from one sample (representing, for 
example, a particular cell type, tissue type or growth condition) is labeled with one 
fluor while cDNA derived from a second sample (representing, for example, a 
different cell type, tissue type, or growth condition) is labeled with the second fluor. 

25 Similar amounts of labeled material from the two samples are cohybridized to the 
microarray. In the case of a microarray experiment in which the samples are labeled 
with Cy5 (which fluoresces red) and Cy3 (which fluoresces green), the primary data 
(obtained by scanning the microarray using a detector capable of quantitatively 
detecting fluorescence intensity) are ratios of fluorescence intensity (red/green, R/G). 

30 These ratios represent the relative concentrations of cDNA molecules that hybridized 
to the cDNAs represented on the microarray and thus reflect the relative expression 
levels of the mRNA corresponding to each cDN A/gene represented on the microarray. 
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Each microarray experiment can provide tens of thousands of data points, each 
representing the relative expression of a particular gene in the two samples. 
Appropriate organization and analysis of the data is of key importance. Various 
computer programs that incorporate standard statistical tools have been developed to 
5 facilitate data analysis. One basis for organizing gene expression data is to group 
genes with similar expression patterns together into clusters. A method for 
performing hierarchical cluster analysis and display of data derived from microarray 
experiments is described in Eisen, M., Spellman, P., Brown, P., and Botstein, D., 
Cluster analysis and display of genome-wide expression patterns, Proc. Natl Acad. 

10 Set USA, 95: 14863-14868, 1998. As described therein, clustering can be combined 
with a graphical representation of the primary data in which each data point is 
represented with a color that quantitatively and qualitatively represents that data point. 
By converting the data from a large table of numbers into a visual format, this 
process facilitates an intuitive analysis of the data. Additional information and details 

1 5 regarding the mathematical tools and/or the clustering , approach itself may be found, 
for example, in Sokal, R.R. & Sneath, P.H.A. Principles of numerical taxonomy, xvi, 
359, W. H. Freeman, San Francisco, 1963; Hartigan, J.A. Clustering algorithms, xiii, 
351, Wiley, New York, 1975; Paull, K.D. et al. Display and analysis of patterns of 
differential activity of drugs against human tumor cell lines: development of mean 

20 graph and COMPARE algorithm. J Natl Cancer Inst 81, 1088-92,1989; Weinstein, 
J.N. et al. Jjfeural computing in cancer drug development: predicting mechanism of 
action. Science 258, 447-51, 1992; van Osdol, W.W., Myers, T.G., Paull, K.D., 
Kohn, K.W. & Weinstein, J.N. Use of the Kohonen self-organizing map to study the 
mechanisms of action of chemotherapeutic agents. J Natl Cancer Inst 86, 1 853-9, 

25 1994; and Weinstein, J.N. et al An information-intensive approach to the molecular 
pharmacology of cancer. Science, 275, 343-9, 1997. 

Further details of the experimental methods used in the present invention are 
found in the Examples. Additional information describing methods for fabricating 
and using microarrays is found in U.S. Patent No. 5,807,522, which is herein 

30 incorporated by reference. Instructions for constructing microarray hardware (e.g., 
arrayers and scanners) using commercially available parts can be found at 
http://cmgm.stanford.edu/pbrown/ and in Cheung, V., Morley, M., Aguilar, F., 
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Massimi, A., Kiicherlapati, R., and Childs, G., Making and reading microarrays, 
Nature Genetics Supplement, 21:15-19, 1999, which are herein incorporated by 
reference. Additional discussions of microarray technology and protocols for 
preparing samples and performing microrarray experiments are found in, for example, 
5 DNA arrays for. analysis of gene expression, Methods Enzymol, 303: 179-205, 1999; 
Fluorescence-based expression monitoring using microarrays, Methods Enzymol, 306: 
3-18, 1999; and M. Schena (ed.), DNA Microarrays: A Practical Approach, Oxford 
University Press, Oxford, UK, 1999. Descriptions of how to use an arrayer and the 
associated software are found at 
1 0 http://cmgm,stanford.edu/pbrow^ which is 

herein incorporated by reference. 

III. Experimental Approach of the Invention 

1 5 The present invention encompasses the realization that genes that are 

differentially expressed are of use in classifying tumors. Differentially expressed 
genes are likely to be responsible for the different phenotypic characteristics of 
tumors. The present invention identifies such genes. In general, a differentially 
expressed gene is a gene whose transcript abundance varies between different 

20 samples, e.g., between different tumor samples, between normal versus tumor 

samples, etp. In the case of the experiment described herein, the transcript level of a 
differentially expressed gene varies by at least 4-fold from its average abundance in a 
given sample set in at least 3 of the samples. However, genes that display smaller 
variations in expression are also within the scope of the invention. In general, the 

25 amount by which the expression varies and the number of samples in which the 
expression varies by that amount will depend upon the number of samples and the 
particular characteristics of the samples. One skilled in the art will be able to 
determine, based on knowledge of the samples, what constitutes a significant degree 
of differential expression . 

30 While analysis of multiple genes is of use in developing a robust classification 

of tumors, each of the differentially expressed genes and their encoded proteins is a 
target for the development of diagnostic and therapeutic agents. Investigation of 
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variation in individual genes in breast tumors reveals that molecular variation can be 
related to important features of clinical variation. For example, expression of the 
estrogen receptor alpha gene (ESR1), the Erb-B2/HER2/neu oncogene, and the 
mutational status at the TP53 9 BRCA1 and BRCA2 loci have shown that molecular 
5 variation can be related to important features of clinical variation. (Discussed, for 
example, in Osborne, C.K., et al, The value of estrogen and progesterone receptors in 
the treatment of breast cancer, Cancer 46, 2884-2888, 1980; Ingvarsson, S., 
Molecular genetics of breast cancer progression, Seminars in Cancer Biology, 9, 277- 
288,1999; 

10 Breast Cancer Linkage Consortium, Pathology of familial breast cancer: differences 
between breast cancers in carriers of BRCA1 and BRCA2 mutations and sporadic 
cases, Lancet, 349, 1505-1510, 1997; Anderson, T. L, et al., Prognostic significance of 
TPS 3 alterations in breast carcinoma. Br J Cancer, 68, 540-548, 1993 and references 
cited in these articles). In particular, approximately 60% to 70% of breast tumors 

1 5 express the estrogen receptor, and this expression has been shown to be a favorable 
prognostic factor (reviewed in Allred, D.C., et al. Prognostic and Predictive Factors in 
Breast Cancer by Immunohistochemical Analysis, Modern Pathology, 1 1(2), 155-168, 
1998). 

As described in more detail in Examples 1, 2, and 4, cDNA microarrays each 
20 representing the same set of approximately 8 1 00 different human genes were 
produced, fhe human cDNA clones used to produce the microarrays contained 
approximately 4000 named genes, 2000 genes with homology to named genes in other 
species, and approximately 2000 ESTs of unknown function. An mRNA sample was 
obtained from each of a set of 84 tissue samples or cell lines. The expression levels of 
25 the approximately 8 1 00 genes were measured in each mRNA sample by hybridization 
to an individual microarray, yielding an expression profile for each gene across the 
experimental samples. Although more details will be found in the Examples, an 
overview of the experimental procedure is presented here so that the invention may be 
better understood. 

30 Variation in patterns of gene expression were characterized in 62 breast tumor 

samples from 40 different patients, 3 normal breast tissue samples, and 19 samples 
from 17 cultured human cell lines (one of which was sampled 3 times under different 
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conditions). Twenty of the tumors had been sampled twice, before and after a 16 week 
course of doxorubicin chemotherapy, and two tumors were paired with a lymph node 
i metastasis from the same patient. The other 1 8 tumor samples were single samples 

from individual tumors. A detailed listing of the tumor samples and various 
5 characteristics including clinical estrogen receptor and Erb-B2 status as assessed using 
antibody staining, estrogen receptor and Erb-B2 status as assessed by microarray 
result, tumor grade, differentiation, survival status and time, age at diagnosis, 
doxorubicin response, and p53 status is presented in Table 5. A listing of the cell 
lines including description and ATCC (American Tissue Culture Collection) number 

10 or reference is presented in Table 3. The cell lines provided a framework for 

interpreting the variation in gene expression patterns seen in the tumor samples and 
included gene expression models for many of the cell types encountered in tumors. 

As described in more detail in Example 2, mRNA was isolated from each 
sample. cDNA labeled with the fluorescent dye Cy5 was prepared from each 

15 experimental sample separately. Fluorescently labeled cDNA, labeled using a second 
distinguishable dye (Cy3), was prepared from a pool of mRNAs isolated from 1 1 
different cultured cell lines. The pooled mRNA sample served as a reference to 
provide a common internal standard against which each gene's expression in each 
experimental sample was measured. 

20 Comparative expression measurements were made by separately mixing Cy5- 

labeled experimental cDNA derived from each of the 84 samples with a portion of the 
Cy3-labeled reference cDNA, and hybridizing each mixture to an individual cDNA 
microarray. The ratio of Cy5 fluorescence to Cy3 fluorescence measured at each 
cDNA element on the microarray was then quantitatively measured. The use of a 

25 common reference standard in each hybridization allowed the fluorescence ratios to be 
treated as comparative measurements of the expression level of each gene across all 
the experimental samples. 

A hierarchical clustering method (Eisen, et al 9 1998) was used to group genes 
based on similarity in the pattern with which their expression varied over all 

30 experimental samples. The same clustering method was used to group the 

experimental samples (tissue and cell lines separately) based on the similarity in their 
patterns of expression. Interpretation of the data obtained from the clustering 
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algorithm was facilitated by displaying the data in the form of tumor and gene 
dendrograms. In the tumor dendrograms, the pattern and length of the branches 
reflects the relatedness of the tumor samples with respect to their expression of genes 
represented on the microarray. Microarray images and tumor and gene dendrograms 
5 are available in Perou, et al, Nature, 2000, and at inventors' Web site (http://genome- 
www.stanford.edu/molecularportraits/). In general, the similarity of the gene 
expression profiles of individual tumor samples or groups of tumor samples to one 
another is inversely related to the length of the branches that connect them. Thus, for 
example, adjacent tumor samples connected to one another by short vertical branches 

10 descending from a common horizontal branch (e.g., tumor samples Norway 48-BE 
and Norway 48-AF close to the right of the tumor dendrogram) are more closely 
related to one another in terms of their gene expression profiles than adjacent tumor 
samples connected to one another by longer vertical branches descending from a 
common horizontal branch (e.g., tumor samples Norway 100-BE and Norway 100-AF 

1 5 at the left side of the tumor dendrogram). To the extent that the gene expression 

programs dictate the biological properties and behavior of the tumors and reflect their 
physiological state and environment, it is expected that the clustering of the tumors 
reflects phenotypic relationships among them, e.g., tumor samples connected by short 
horizontal branches (i.e., located in close proximity to one another) are expected to 

20 exhibit similar phenotypic features. In the gene dendrograms, the pattern and length 
of the branches reflects the relatedness of the genes with respect to their expression 
profiles across the tumor samples. Similarly to the tumor samples, genes connected 
by short vertical branches are more similar to one another in terms of expression 
profile than genes connected by longer vertical branches. 

25 The expression patterns of the genes were also displayed using a matrix 

format, with each row representing all of the hybridization results for a single cDNA 
element on the array and each column representing the measured expression levels for 
all genes in a single sample. In this format, tumor samples with similar patterns of 
expression across the gene set are close to each other along the horizontal dimension. 

30 Similarly, genes with similar expression patterns across the set of samples are close to 
each other along the vertical dimension. To allow the patterns of expression to be 
visualized, the normalized expression value of each gene was represented by a colored 
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box, using red to represent expression levels greater than the median and green to 
represent expression levels less than the median. In all images the brightest red color 
, represents transcript levels at least 16-fold greater than the median, and the brightest 

green color represents transcript levels at least 16-fold below the median. This 
5 display format facilitates comparisons between genes and the recognition of 
significant patterns. 

As described herein, systematic investigation of gene expression patterns in 
human breast tumors and their correlation to specific features of phenotypic variation 
offers a basis for an improved molecular taxonomy of breast cancers. Such a 

1 0 taxonomy has significant clinical utility. For example, correlation of gene expression 
patterns with outcome in the absence of treatment is of use in deciding whether a 
patient should receive adjuvant chemotherapy after surgery. As another example, 
genes whose expression level varies between tumors that are sensitive to 
chemotherapy and tumors that are resistant to chemotherapy are of use in predicting 

15 likelihood of response and in selection of appropriate treatment. Genes whose 

expression level varies between tumor samples taken before and after therapy are of 
use in understanding the response of tumors to treatment. 

IV. Further Aspects of the Invention 

20 A. Basal tumor subclasses and corresponding gene subsets 

Gene and tumor dendrograms were derived from data obtained by performing 
a microarray analysis on the set of breast tumor and breast tissue samples described 
above, using a set of genes (the "intrinsic" gene set) described further below and in 
Example 8. Appendices A and C present the resulting tumor dendrograms and color 

25 matrix displays of the gene expression profiles obtained. Although technically the 
dendrograms identify groups of tumor samples, since each sample is obtained from a 
specific tumor the dendrograms also identify groups of tumors. Thus, in general, a 
group of tumor samples corresponds to a group of tumors. Therefore, throughout 
most of the discussion herein reference will be made to tumor groups, classes, etc., 

30 rather than tumor sample groups, classes, etc. The clustering method permits the 
identification of subsets of genes with related expression profiles across a set of 
tumors and the identification of groups or classes of tumors with similar expression 
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profiles across a set of genes. Although the existence of gene subsets is revealed by 
the display of the data in dendrogram format, understanding the significance of the 
, gene subsets obtained in experiments such as those described above requires 

interpretation in light of knowledge about the genes and tumor samples. Groups of 
5 tumors identified based on their expression patterns of sets of genes (e.g., groups of 
tumors that overexpress genes in particular gene subsets) can be designated as tumor 
classes when deemed significantly distinct to warrant a distinct classification. 

Table 5 includes information regarding the clinical outcome of the tumors 
from which the samples were obtained. In particular, the table includes survival time 

ilO of the patients and, for some of the tumors, whether or not the tumor responded to 
chemotherapy (doxorubicin). Such information was used to demonstrate that the 
basal tumor class is characterized by a poor clinical outcome relative to the other 
tumors. Differences in survival between groups of patients was demonstrated using 
the Kaplan-Meier technique for survival analysis, which is implemented in computer 

1 5 software such as the SAS package (S AS Institute, Inc, Cary, NC) and described in the 
accompanying manual. Of course various other statistical techniques can be used to 
detect differences in survival or any other clinical parameters between groups of 
tumors. Various appropriate statistical techniques useful for analyzing survival are 
discussed, for example, in Lawless, J.F., Statistical Models and Methods for Lifetime 

20 Data. New York: John Wiley & Sons, 1 982. Lee, Elisa T. Statistical Methods for 
Survival Data Analysis. 2nd ed. New York: John Wiley & Sons, 1992. Marubini, 
Ettore, and Valsecchi, Maria Grazia, Analysing Survival Data from Clinical Trials 
and Observational Studies. New York: John Wiley & Sons, 1995. Miller, Rupert G. 
Jr. Survival Analysis. New York: John Wiley & Sons, 1981. Rosner, Bernard, 

25 Fundamentals of Biostatistics. 4th ed. Belmont, California: Duxbury Press, 1995.) 
Other clinical parameters of importance include response to therapy, time to 
recurrence, etc. 

As will be appreciated by one of ordinary skill in the art, the correlation of particular 
tumor groups with survival or other parameters of clinical importance can be 
30 strengthened by the inclusion of data obtained from additional tumor samples. 

The invention identifies genes and gene subsets that are associated with the 
basal tumor subclass. The genes and gene subsets are identified in part by the 
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overexpression of certain members of each subset in a particular tumor group and are 
also defined in part based on the proximity of genes within each subset to one another 
in a gene dendrogram. As used herein unless otherwise stated, a gene is 
overexpressed in a tissue sample at the RNA level if a mRNA corresponding to (i.e., 
5 transcribed from) the gene is present in excess relative to the median abundance of 
that mRNA across the set of analyzed specimens. A gene is overexpressed in a tissue 
sample at the protein level if a polypeptide corresponding to (i.e., translated from a 
mRNA that was transcribed from) the gene is present in excess relative to the 
abundance of that polypeptide across the set of analyzed specimens. The 

10 measurement of relative abundance using cDNA microarrays relies upon the 
comparison of all samples relative to a common reference sample that provides 
cognate mRNA for as many genes as possible with the goal of providing a common 
denominator for the measured ratios across all samples. Each tested sample can be 
compared to all other tested samples in ratio units relative to the reference. This 

1 5 allows reproducible determination of gene expression in each tested sample relative to 
the median gene expression across any given sample set (Ross, DT, et ah, Systematic 
variation in gene expression patterns in human cancer cell lines, Nat Genet 2000 
Mar;24(3):227-35, 2000). In general, an appropriate reference sample comprises a 
renewable source of diverse cell samples such as a mixture of cells obtained from the 

20 panel of 1 1 cell lines listed in Table 3. A particularly preferred reference sample is 
one in whicji all relevant genes are represented in significant abundance above 
measured background. This provides for a reproducible measurement of reference 
signal for all relevant genes. As is well known in the art, there is generally a 
correlation between overexpression or underexpression at the RNA level and 

25 overexpression or underexpression at the protein level. In other words, if a mRNA is 
overexpressed then it is highly likely that the corresponding polypeptide is also 
overexpressed, and if a mRNA is underexpressed then it is highly likely that the 
corresponding polypeptide is underexpressed. Therefore, detection of either mRNA 
or a corresponding polypeptide is generally sufficient to determine whether a 

30 particular gene is over or underexpressed. However, as is well known in the art, in 
certain situations it may be more convenient and/or practical to detect mRNA while in 
other situations it may be more convenient and/or practical to detect polypeptides. 
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As mentioned above, genes that are overexpressed in one or more samples 
may be identified by examining the microarray data displayed in matrix format, 
, wherein red squares indicate overexpression. The basal gene subset includes a 

number of genes known to be expressed in basal epithelial cells (e.g., cytokeratins 5 
5 and 17) and is characterized in that certain of the genes in the subset are 

overexpressed at the RNA level in samples obtained from a subset of tumors that had 
a poor prognosis relative to the entire group of tumors (the basal group). Referring to 
Perou, et al, Nature, 2000, the basal gene subset comprises two subsets identified with 
a blue bar and a green bar along the side of the color matrices. Genes in the basal 

10 gene subset are, in general, overexpressed in tumors in the basal tumor group 

(identified with orange dendrogram branches). Of course it will be appreciated that 
additional genes, not necessarily falling into either of the two basal gene subsets, also 
have an expression pattern similar to that of cytokeratin 5 and/or cytokeratin 17. 

It will be appreciated that not all of the genes are overexpressed to a similar 

1 5 extent within a particular group of tumors and that expression of any given gene will 
likely vary between different tumors in a group. For example, genes identified as 
"Cytochrome P450, subfamily IIA" and "Lymphoid nuclear protein related to AF4" 
are significantly overexpressed in tumors at the far right of the luminal tumor group 
(Stanford 24, Norway 27, 28, 26, and 56) while they are expressed at lesser levels in 

20 other members of the lumbal tumor group. Conversely, genes identified as "417081" 
and "Homo Sapiens PWD gene mRNA, 3' end" are, in general, relatively 
underexpressed in these tumors. However, the overall expression patterns of genes in 
each subset over all tissue samples, are sufficiently similar to cause them to cluster in 
close proximity on the gene dendrogram. Thus whether a gene is a member of one of 

25 the inventive gene subsets is not determined solely on the basis of the overexpression 
of that gene within a tumor subset but also on the relationship of the overall 
expression pattern of the gene to the expression pattern of other genes within the 
subset. It will further be appreciated that a gene may be overexpressed in more than 
one tumor group. For example, certain of the genes in the basal subset are expressed 

30 in a group identified with green dendrogram branches, which includes both tumor and 
normal tissue samples, in addition to being overexpressed in the basal tumor group. 
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B. Diagnostics and methods of use thereof 

The invention provides reagents for detecting expression products of the basal 
marker genes described herein, i.e., cadherin3, matrix metalloproteinase 14, and 
cadherin EOF LAG seven-pass G-type receptor 2. Detection of these expression 
5 products identifies tumors in the basal tumor subclass. While not wishing to be bound 
by any theory, inventors suggest that breast carcinoma with basal cell like features has 
distinguishing biology that could be targeted in therapeutic development. Once 
therapeutics targeted at such tumors are identified (as described elsewhere herein), 
detection of these expression products allows identification of subjects likely to 

10 benefit from these therapeutics. In addition, since the invention has established a 

correlation between the expression of the three basal marker genes and the expression 
of cytokeratinl7 and also established that cytokeratin 5/6 and/or cytokeratin 17 
expression in breast tumors correlates with a poor outcome, detection of expression of 
the basal marker genes is useful in guiding therapeutic decisions in general. If it is 

1 5 known that a patient has a tumor that falls into the basal tumor subclass and thus has a 
poor prognosis, a more aggressive approach to therapy may be warranted than in 
tumors not falling within the basal subclass. For example, in patients where there is 
no evidence of disease in lymph nodes (node-negative patients), a decision must be 
made regarding whether to administer chemotherapy (adjuvant therapy) following 

20 surgical removal of the tumor. While some patients are likely to benefit from such 
treatment, it has significant side effects. Presently it is difficult or impossible to 

I? 

predict which patients would benefit. Knowing that a patient falls into a poor 
prognosis category may help in this decision. Of note, inventors showed that in node- 
negative patients cytokeratin 5/6 and/or 17 expression was a prognostic factor 

25 independent of tumor size and tumor grade. See Example 1 3 for further discussion of 
these issues and inventor's findings. Detecting expression of the basal marker genes 
of the present invention may provide information related to tumor progression. It is 
well known that as tumors progress, their phenotypic characteristics may change. The 
invention contemplates the possibility that breast tumors may evolve from luminal- 

30 like to basal-like (or vice versa), and that detection of expression products of the basal 
marker genes can be used to detect such progression. 




PCT/US01/23843 



43 



WO 02/08765 PCT7US0 1/23843 



It is well known in the art that some. tumors respond to certain therapies while others 
do not. In general there is very little information that may be used to determine, prior 
to treatment, the likelihood that a specific tumor will respond to a given therapeutic 
agent Many compounds have been tested for anti-tumor activity and appear to be 
5 effective in only a small percentage of tumors. Due to the current inability to predict 
which tumors will respond to a given agent, these compounds have not been 
developed into marketed therapeutics. This problem reflects the fact that current 
methods of classifying tumors are limited. However, the present invention offers the 
possibility of identifying tumor subgroups characterized by a significant likelihood of 

10 response to a given agent. Tumor sample archives containing tissue samples obtained 
from patients that have undergone therapy with various agents are available along 
with information regarding the results of such therapy. In general such archives 
consist of tumor samples embedded in paraffin blocks. These tumor samples can be 
analyzed for their expression of polypeptides encoded by the basal marker genes of 

15 the present invention. For example, immunohistochemistry can be performed using 
antibodies that bind to the polypeptides. Tumors belonging to the basal tumor 
subclass may then be identified on the basis of this information. It is then possible to 
correlate the expression of the basal marker genes with the response of the tumor to 
therapy, thereby identifying particular compounds that show a superior efficacy in 

20 tumors in this class as compared with their efficacy in tumors overall or in tumors not 
falling withjn the basal tumor subclass. Once such compounds are identified it will be 
possible to select patients whose tumors fall into the basal tumor subclass for 
additional clinical trials using these compounds. Such clinical trials, performed on a 
selected group of patients, are more likely to demonstrate efficacy. The reagents 

25 provided herein, therefore, are valuable both for retrospective and prospective trials. 

In the case of prospective trials, detection of expression products of one or 
more of the marker genes may be used to stratify patients prior to their entry into the 
trial or while they are enrolled in the trial. In clinical research, stratification is the 
process or result of describing or separating a patient population into more 

30 homogeneous subpopulations according to specified criteria. Stratifying patients 
initially rather than after the trial is frequently preferred, e.g., by regulatory agencies 
such as the U.S. Food and Drug Administration that may be involved in the approval 
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process for a medication. In some cases stratification may be required by the study 
design. Various stratification criteria may be employed in conjunction with detection 
, of expression of one or more basal marker genes. Commonly used criteria include 
age, family history, lymph node status, tumor size, tumor grade, etc. Other criteria 
5 including, but not limited to, tumor aggressiveness, prior therapy received by the 

patient, ER and/or PR positivity, Her2neu status, p53 status, various other biomarkers, 
etc., may also be used. Stratification is frequently useful in performing statistical 
analysis of the results of a trial. Ultimately, once compounds that exhibit superior 
efficacy against breast basal tumors are identified, reagents for detecting expression of 

1 0 the basal marker genes may be used to guide the selection of appropriate 
chemotherapeutic agent(s). 

In summary, by providing reagents and methods for classifying tumors based 
on their expression of the basal marker genes, the present invention offers a means to 
individualize therapy. The invention further provides a means to identify a patient 

15 population that may benefit from potentially promising therapies that have been 

abandoned due to inability to identify the patients who would benefit from their use. 

Information regarding the expression of the basal marker genes is useful even 
in the absence of specific information regarding their biological function or role in 
tumor development, progression, maintenance, or response to therapy. Although the 

20 reagents disclosed herein find particular application with respect to breast cancer, the 
invention also contemplates their use to provide diagnostic and/or prognostic 
information for other cancer types. As is well known in the art, mutations in a single 
gene (e.g., the p53 gene) may play a role in the development of multiple cancer types. 
Thus it is contemplated that some or all of the basal marker genes described herein 

25 will be important both in breast cancer and in one or more other tumor types, 
particularly since basal cells are a feature of epithelia throughout the body. 

In one aspect, the invention provides a method of classifying tumors by 
detecting the presence of one or more of the inventive gene products encoded by the 
cadherin3, matrix metalloproteinase 14, and cadherin EGF LAG seven-pass G-type 

30 receptor 2 genes. As is well known in the art, a polypeptide may be detected using a 
variety of techniques that employ an antibody that binds to the polypeptide. As 
described further below, these techniques include enzyme-linked immunosorbent 
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assay (ELISA), immunoblot, and immunohistochemistry. The invention encompasses 
the use of protein arrays, including antibody arrays, for detection of the polypeptide. 
The use of antibody arrays is described, for example, in Haab, B., et al, "Protein 
microarrays for highly parallel detection and quantitation of specific proteins and 
5 antibodies in complex solutions", Genome Biol 2001 ;2(2), 2001 . Other types of 
protein arrays are known in the art. 

In addition, in certain embodiments of the invention the polypeptides are 
detected using other modalities known in the art for the detection of polypeptides, 
such as aptamers (Aptamers, Molecular Diagnosis, Vol. 4, No. 4, 1999), reagents 

10 derived from combinatorial libraries for specific detection of proteins in complex 
mixtures, random peptide affinity reagents, etc. In general, any appropriate method 
for detecting a polypeptide may be used in conjunction with the present invention, 
although antibodies may represent a particularly appropriate modality. 

The invention provides antibodies to the polypeptides encoded by the encoded 

15 by the cadherin3, matrix metalloproteinase 14, and cadherin EGF LAG seven-pass G- 
type receptor 2 genes. Example 10 describes the generation of polyclonal antibodies 
to these polypeptides. In general, antibodies (either monoclonal or polyclonal) may 
be generated by methods well known in the art and described, for example, in Harlow, 
E., Lane, E., and Harlow, E., (eds.) Using Antibodies: A Laboratory Manual, Cold. 

20 Spring Harbor Laboratory Press, Cold Spring Harbor, 1998. Details and references 
for the production of antibodies based on an inventive polypeptide may also be found 
in U.S. Patent No. 6,008,337. Antibodies may include, but are not limited to, 
polyclonal, monoclonal, chimeric (e.g., "humanized"), and single chain antibodies, 
and Fab fragments, antibodies generated using phage display technology, etc. The 

25 invention encompasses "fully human" antibodies produced using the XenoMouse™ 
technology (AbGenix Corp., Fremont, CA) according to the techniques described in 
U.S. Patent No. 6,075,181. 

The invention encompasses a number of uses for these antibodies. Detection 
of the basal marker polypeptides may be used to provide diagnostic information. As 

30 used herein the term "diagnostic information" includes, but is not limited to, any type 
of information that is useful in determining whether a patient has, or is at increased 
risk for developing, a disease or disorder; for providing a prognosis for a patient 
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having a disease or disorder; for classifying a disease or disorder; for monitoring a 
patient for recurrence of a disease or disorder; for selecting a preferred therapy; for 
, predicting the likelihood of response to a therapy, etc. In certain embodiments of the 

invention, the antibodies are used for providing diagnostic information for cancer, 
5 particularly for breast cancer, but they may also be of use for providing diagnostic 
information for other diseases, e.g., other types of cancer. 

In general, diagnostic assays in which the antibodies may be employed include 
methods that use the antibody to detect the polypeptide in a tissue sample, cell 
sample, body fluid sample (e.g., serum), cell extract, etc. Such methods typically 

1 0 involve the use of a labeled secondary antibody that recognizes the primary antibody 
(i.e., the antibody that binds to the polypeptide being detected). Depending upon the 
nature of the sample, appropriate methods include, but are not limited to, 
immunohistochemistry, radioimmunoassay, ELISA, immunoblotting, and FACS 
analysis. In the case where the polypeptide is to be detected in a tissue sample, e.g., a 

15 biopsy sample, immunohistochemistry is a particularly appropriate detection method. 
Techniques for obtaining tissue and cell samples and performing 
immunohistochemistry and FACS are well known in the art. Such techniques are 
routinely used, for example, to detect the ER in breast tumor tissue or cell samples. In 
general, such tests will include a negative control, which can involve applying the test 

20 to normal tissue so that the signal obtained thereby can be compared with the signal 
obtained from the sample being tested. In tests in which a secondary antibody is used 
to detect the antibody that binds to the polypeptide of interest, an appropriate negative 
control can involve performing the test on a portion of the sample with the omission 
of the antibody that binds to the polypeptide to be detected, i.e., with the omission of 

25 the primary antibody. Antibodies suitable for use as diagnostics generally exhibit 
high specificity for the target polypeptide and low background. In general, 
monoclonal antibodies are preferred for diagnostic purposes. 

In general, the results of such a test can be presented in any of a variety of 
formats. The results can be presented in a qualitative fashion. For example, the test 

30 report may indicate only whether or not a particular polypeptide was detected, perhaps 
also with an indication of the limits of detection. The results may be presented in a 
semi-quantitative fashion. For example, various ranges may be defined, and the 
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ranges may be assigned a score (e.g., 1+ to 4+) that provides a certain degree of 
quantitative information. Such a score may reflect various factors, e.g., the number of 
, cells in which the polypeptide is detected, the intensity of the signal (which may 

indicate the level of expression of the polypeptide), etc. The results may be presented 
5 in a quantitative fashion, e.g., as a percentage of cells in which the polypeptide is 
detected, as a protein concentration, etc. As will be appreciated by one of ordinary 
skill in the art, the type of output provided by a test will vary depending upon the 
technical limitations of the test and the biological significance associated with 
detection of the polypeptide. For example, in the case of certain polypeptides a purely 

1 0 qualitative output (e.g., whether or not the polypeptide is detected at a certain 

detection level) provides significant information. In other cases a more quantitative 
output (e.g., a ratio of the level of expression of the polypeptide in the sample being 
tested versus the normal level) is necessary. 

Sequence analysis of two of the basal marker proteins, matrix 

15 metalloproteinase 14 and cadherin EGF LAG' seven-pass G-type receptor 2 indicates 
that they possess one or more transmembrane domains and an extracellular portion. 
Sequence analysis of the third basal marker protein, cadherin3, indicates that jt also 
has an extracellular portion. The invention encompasses the recognition that since 
these proteins have an extracellular domain, the likelihood exists that a portion of 

20 these proteins may therefore be present in serum (e.g., the portion may be cleaved by 
endogenous proteases and released into the bloodstream), enabling their detection 
through a blood test rather than requiring a biopsy specimen. Regardless of whether 
the proteins are present in serum, the likelihood that cadherin3, cadherin EGF LAG 
seven-pass G-type receptor 2, and matrix metalloproteinase 14 are membrane bound 

25 makes them attractive candidates for antibody diagnostics. The proteins may be 

detected on cells that enter the bloodstream or in samples obtained from a tumor site 
(e.g., cell or tissue samples). 

Measurement of prostate specific antigen (PSA) in serum using an 
immunoassay technique is widely used as a method for early detection of prostate 

30 cancer and for monitoring recurrence or progression after therapy, etc. Methods and 
considerations in the use of this clinical marker are described, for example, in Chen 
DW, et al. Prostate-specific antigen as a marker for prostate cancer: A monoclonal 
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and polyclonal immunoassay compared. Clin Chem , 33:1916-1920, 1987; Oesterling 
JE, et al Free, complexed and total serum prostate specific antigen: The 
establishment of appropriate reference ranges for their concentrations and ratios. J 
Urol 154:1090-1095, 1995; Hybritech Tandem®-MP Free PSA. Package insert. March 
5 1 998 and Hybritech Tandem® Total PSA. Package insert., Hybritech, Inc., San Diego, 
CA. One of ordinary skill in the art will readily be able to develop appropriate assays 
for polypeptides encoded by the basal marker genes described herein and to apply 
them to the detection of such polypeptides in serum. Such assays may be used as 
screening tests for cancer, to detect recurrence or progression of cancer, to monitor the 
1 0 response of cancer to therapy, to classify and/or provide prognostic information 
regarding a tumor, etc. 

In certain embodiments of the inventive methods a single antibody is used 
whereas in other embodiments of the invention multiple antibodies, directed either 
against the same or against different polypeptides can be used to increase the 

1 5 sensitivity or specificity of the test or to provide more detailed information than that 
provided by a single antibody. Thus the invention encompasses the use of a battery of 
antibodies that bind to polypeptides encoded by the basal marker genes identified 
herein. Of course these antibodies can also be used in conjunction with antibodies 
against other polypeptides, including antibodies that bind to cytokeratin 5/6 or 17. 

20 Various other techniques for detecting the basal marker polypeptides identified 

herein are within the scope of the invention. For example, a basal marker polypeptide 
may be detected using an assay for a biochemical activity of the polypeptide, e.g., an 
enzymatic activity. This type of assay may be especially convenient for tests on 
samples such as blood or other body fluids. Such an approach may be particularly 

25 attractive in the case of matrix metalloproteinase 14. As described above, matrix 
metalloproteinases are involved in cleavage of various proteins in the extracellular 
matrix. The cleavage specificity of this protein may readily be determined, and an 
appropriate substrate prepared. (See, e.g., Turk, B., etal., "Determination of protease 
cleavage site motifs using mixture-based oriented peptide libraries", Nature 

30 Biotechnology* 19(7): 661-667, 2001, which discusses cleavage site motifs for various 
metalloproteases including MMP14, referred to as MT1-MMP therein.) Cleavage of 
this substrate may then be detected. In certain embodiments of the invention the 
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substrate includes a fluorescent moiety for convenient detection. The invention 
contemplates use of fluorescent resonance energy transfer (FRET) assays to detect 

, matrix metalloproteinase 1 4 (see http://www.aurorabio.com). 

Although in many cases detection of polypeptides using antibodies represents 

5 the most convenient means of determining whether a gene is expressed (or 

overexpressed) in a particular sample, the invention also encompasses the use of 
polynucleotides for this purpose. Microarray analysis is but one means by which 
polynucleotides can be used to detect or measure gene expression. Expression of a 
gene can also be measured by a variety of techniques that make use of a 
10 polynucleotide corresponding to part or all of the gene rather than an antibody that 
binds to a polypeptide encoded by the gene. Appropriate techniques include, but are 
not limited to, in situ hybridization, Northern blot, and various nucleic acid 
amplification techniques such as PCR, quantitative PCR, and the ligase chain 
reaction. 

1 5 One detection method involves performing quantitative PCR on a diagnostic 

sample using a set of oligonucleotide primers designed to amplify the genes in one or 
more of the inventive gene sets of gene subsets. (Considerations for primer design are 
well known in the art and are described, for example, in Newton, et al. (eds.) PCR: 
Essential data Series, John Wiley & Sons; PCR Primer: A Laboratory Manual, Cold 

20 Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1995; White, et al.. (eds.) 
PCR Protocols: Current methods and Applications, Methods in Molecular Biology, 
The Humana Press, Totowa, NJ, 1993. In addition, a variety of computer programs 
known in the art may be used to select appropriate primers.) 

According to one embodiment of this method the diagnostic sample is 

25 distributed into multiple vessels, e.g., multiple wells of a 396 well microtiter plate. A 
pair of primers designed to amplify a portion of a gene in one of the inventive gene 
sets or subsets is added to each well, and PCR amplification is performed. The 
resulting product can then be detected using any of a number of methods known in the 
art depending upon the particular method of performing quantitative PCR that is 

30 employed. Primers sufficient for amplification of genes that allow quantitation of 
different cell types within the sample may also be included in the set of primers. 



50 



WO 02/08765 PCT/USO 1/23843 



The invention also encompasses the detection of mutations within any of the 
basal marker genes or within a regulatory region of a basal marker gene. Such 
mutations may include, but are not limited to, deletions, additions, substitutions, and 
amplification of regions of genomic DNA that include all or part of a gene. Methods 
5 for detecting such mutations are well known in the art. Such mutations may result in 
overexpression or inappropriate expression of the gene. Detection of mutations can be 
used, for example, to predict the likelihood that an individual will develop a condition 
associated with the mutation. 

Another aspect of the invention comprises a . kit to test for the presence of any 

1 0 of the inventive polynucleotides or polypeptides, e.g., in a tissue sample or in a body 
fluid. The kit can comprise, for example, an antibody for detection of a polypeptide 
or a probe for detection of a polynucleotide. In addition, the kit can comprise a 
reference or control sample, instructions for processing samples, performing the test 
and interpreting the results, buffers and other reagents necessary for performing the 

15 test. In certain embodiments of the invention the kit comprises a panel of antibodies. 
In certain embodiments of the invention the kit comprises pairs of primers for 
detecting expression of one or more of the basal marker genes. In certain 
embodiments of the invention the kit comprises a cDNA or oligonucleotide array for 
detecting expression of one or more of the basal marker genes. 

20 

D. Therapeutics 

The invention encompasses the use of the basal marker genes and their 
expression products as targets for the development of therapeutics. The invention 
specifically encompasses antagonists to the basal marker genes and their expression 

25 products. Such antagonists (which include, but are not limited to, antibodies, small 
molecules, antisense nucleic acids) may be produced or identified using any of a 
variety of methods known in the art. For example, a purified polypeptide or fragment 
thereof may be used to raise antibodies or to screen libraries of compounds to identify 
those that specifically bind to the polypeptide. The likelihood that cadherin3, 

30 cadherin EGF LAG seven-pass G-type receptor 2, and matrix metalloproteinase 1 4 are 
membrane bound makes them attractive candidates for antibody therapeutics. 

51 




WO 02/08765 PCT/US01/23843 



Preferably antibodies suitable for use as therapeutics exhibit high specificity 
for the target polypeptide and low background binding to other polypeptides. In 
, general, monoclonal antibodies are preferred for therapeutic purposes. In the case of 

breast cancer, antibodies against the HER2/neu/ErbB2 polypeptide (a polypeptide 
5 homologous to the epidermal growth factor receptor) represent a paradigm in terms of 
the development of therapeutic antibodies. The HER2/neu/ErbB2 gene is 
overexpressed in approximately 25 to 30 percent of metastatic breast tumors, and an 
antibody against the HER2/neu/ErbB2 polypeptide, Herceptin® (Trastuzumab) is 
approved for the treatment of certain patients with metastatic breast cancer, 

10 confirming the utility of therapeutic antibodies directed against polypeptides that are 
specifically overexpressed in particular tumors subsets. Proteins that are expressed on 
the cell surface, such as the basal marker proteins described herein, represent preferred 
targets for the development of therapeutic agents, particularly therapeutic antibodies. 
The presence of these proteins on the cell surface can be confirmed using 

15 immunohistochemisty. 

Antibodies directed against a polypeptide expressed by a cell may have a 
number of mechanisms of action. In certain instances, e.g., in the case of a 
polypeptide that exerts a growth stimulatory effect on a cell, antibodies may directly 
antagonize the effect of the polypeptide and thereby arrest tumor progression, trigger 

20 apoptosis, etc. While not wishing to be bound by any theory, it may be particularly 
likely that certain genes that are overexpressed in tumors having a poor prognosis 
(e.g., genes in the basal gene subsets) encode polypeptides that have a growth 
stimulatory effect on tumor cells or facilitate the growth of such cells in some other 
way, e.g., by enhancing angiogenesis, by allowing cells to overcome normal growth 

25 regulatory mechanisms, or by blocking mechanisms that would normally lead to 
elimination of mutated or otherwise abnormal cells. 

In certain embodiments of the invention the antibody may serve to target a 
toxic moiety to the cell. Thus the invention encompasses the use of antibodies that 
have been conjugated with a cytotoxic agent, e.g., a toxin such as ricin or diphtheria 

30 toxin, a radioactive moiety, etc. Such antibodies can be used to direct the cytotoxic 
agent specifically to cells that express the inventive polypeptide, particularly in the 
case of a polypeptide that is expressed on the cell surface. 
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Although certain antagonists may function through direct interaction with a 

polypeptide, e.g., by inhibiting its activity, others may function by affecting 
, expression of the polypeptide. Reduction in expression of an endogenously produced 

polypeptide may be achieved by the administration of antisense nucleic acids (e.g., 
5 oligonucleotides, RNA, DNA, most typically oligonucleotides that have been 

modified to improve stability or targeting) or peptide nucleic acids comprising 

sequences complementary to those of the mRNA that encodes the polypeptide. 

Antisense technology and its applications are described in Phillips, M.L (ed.) 

Antisense Technology, Methods Enzymol., Volumes 313 and 314, Academic Press, 
10 San Diego, 2000, and references mentioned therein. Ribozymes (catalytic RNA 

molecules that are capable of cleaving other RNA molecules) represent another 

approach to reducing gene expression. Such ribozymes can be designed to cleave 

specific mRNAs corresponding to a gene of interest. Their use is described in U.S. 

Patent No. 5,972,621, and references therein. The invention encompasses the delivery 
1 5 of antisense and/or ribozyme molecules via a gene therapy approach in which vectors 

or cells expressing the antisense molecules are administered to an individual. 

It may also be desirable to increase the expression of a gene in an inventive 

gene subset or to increase the activity of the corresponding polypeptide. For example, 

in the case of genes that are overexpressed in tumors having a good prognosis, e.g., 
20 certain genes in the luminal subset, it may be desirable to increase the expression of 

such genes or the activity of the corresponding polypeptides in tumors that fail to 

express these genes. 

Small molecule modulators (e.g., inhibitors or activators) of gene expression 

are also within the scope of the invention and may be detected by screening libraries 
25 of compounds using, for example, cell lines that express the polypeptide or a version 

of the polypeptide that has been modified to include a readily detectable moiety. 

Methods for identifying compounds capable of modulating gene expression are 

described, for example, in U.S. Patent No. 5,976,793. The screening methods 

described therein are particularly appropriate for identifying compounds that do not 
30 naturally occur within cells and that modulate the expression of genes of interest 

whose expression is associated with a defined physiological or pathological effect 

within a multicellular organism. 
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More generally, the invention encompasses compounds that modulate the 
activity of a basal marker gene of the present invention. Methods of screening for 
such interacting compounds are well known in the art and depend, to a certain degree, 
on the particular properties and activities of the polypeptide encoded by the gene. 
5 Representative examples of such screening methods may be found, fpr example, in 
U.S. Patent No. 5,985,829, U.S. Patent No. 5,726,025, U.S. Patent No. 5,972,621, and 
U.S. Patent No. 6,015,692. The skilled practitioner will readily be able to modify and 
adapt these methods as appropriate for a given polypeptide. Thus the invention 
encompasses methods of screening for molecules that modulate the activity of a 
1 0 polypeptide encoded by a basal marker gene. 

The invention also encompasses the use of polynucleotide sequences 
corresponding to basal marker genes, or portions thereof as DNA vaccines. Such 
vaccines comprise polynucleotide sequences, typically inserted into vectors, that 
direct the expression of an antigenic polypeptide within the body of the individual 
. 1 5 being immunized. Details regarding the development of vaccines, including DNA 

vaccines for various forms of cancer may be found, for example, in Brinckerhoff L.H., 
Thompson L. W., Slingluff C.L., Jr., Melanoma Vaccines, Curr Opin Oncol, 
12(2):163-73, 2000 and in Stevenson, F.K., DNA vaccines against cancer: from genes 
to therapy,^. Oncol, 10(12): 1413-8, 1999 and references therein. The 

20 polypeptides, or fragments thereof, that are encoded by genes in the inventive gene 
subsets may also find use as cancer vaccines. Such vaccines may be used for the 
prevention and/or treatment of cancer. 

The invention includes pharmaceutical compositions comprising the inventive 
antibodies, or small molecule inhibitors, agonists, or antagonists described above. In 

25 general, a pharmaceutical composition will include an active agent in addition to one 
or more inactive agents such as a sterile, biocompatible carrier including, but not 
limited to, sterile water, saline, buffered saline, or dextrose solution. The 
pharmaceutical compositions may be administered either alone or in combination with 
other therapeutic agents including other chemotherapeutic agents, hormones, vaccines, 

30 and/or radiation therapy. By "in combination with", it is not intended to imply that 
the agents must be administered at the same time or formulated for delivery together, 
although these methods of delivery are within the scope of the invention. In general, 
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each agent will be administered at a dose and on a time schedule determined for that 
agent. Additionally, the invention encompasses the delivery of the inventive 
, pharmaceutical compositions in combination with agents that may improve their 

bioavailability, reduce or modify their metabolism, inhibit their excretion, or modify 
5 their distribution within the body. The invention encompasses treating cancer, 
particularly breast cancer, by administering the pharmaceutical compositions of the 
invention. Although the pharmaceutical compositions of the present invention can be 
used for treatment of any subject (e.g., any animal) in need thereof, they are most 
preferably used in the treatment of humans. 

10 The pharmaceutical compositions of this invention can be administered to 

humans and other animals by a variety of routes including oral, intravenous, 
intramuscular, intraarterial, subcutaneous, intraventricular, transdermal, rectal 
intravaginal, intraperitoneal, topical (as by powders, ointments, or drops), bucal, or as 
an oral or nasal spray or aerosol. In general the most appropriate route of 

15 administration will depend upon a variety of factors including the nature of the 
compound (e.g., its stability in the environment of the gastrointestinal tract), the 
condition of the patient (e.g., whether the patient is able to tolerate oral 
administration), etc. At present the intravenous route is most commonly used to 
deliver therapeutic antibodies and nucleic acids. However, the invention encompasses 

20 the delivery of the inventive pharmaceutical composition by any appropriate route 
taking into consideration likely advances in the sciences of drug delivery. 

General considerations in the formulation and manufacture of pharmaceutical 
agents may be found, for example, in Remington 's Pharmaceutical Sciences, 19 th ed., 
Mack Publishing Co., Easton, PA, 1995. It will be appreciated that certain of the 

25 compounds of the present invention can exist in free form for treatment, or, where 
appropriate, in salt form, as discussed in more detail below. Compounds to be 
utilized in the pharmaceutical compositions include compounds existing in free form 
or pharmaceutically acceptable derivatives thereof, as defined herein, such as 
pharmaceutical^ acceptable salts, esters, salts of such esters, or any other adduct or 

30 derivative, which upon administration to a patient in need, is capable of providing, 
directly or indirectly, a compound as otherwise described herein, or a metabolite or 
residue thereof, e.g., a prodrug. Thus, as used herein, the term "pharmaceutically 
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acceptable salt" refers to those salts which are, within the scope of sound medical 
judgment, suitable for use in contact with the tissues of humans and lower animals 
without undue toxicity, irritation, allergic response and the like, and are 
commensurate with a reasonable benefit/risk ratio. Pharmaceutically acceptable salts 
5 are well known in the art. For example, S. M. Berge, et al. describe pharmaceutically 
acceptable salts in detail in /. Pharmaceutical Sciences^ 66: 1-19 (1977), incorporated 
herein by reference. The salts can be prepared in situ during the final isolation and 
purification of the compounds of the invention, or separately by reacting the free base 
function with a suitable organic acid. Examples of pharmaceutically acceptable, 

10 nontoxic acid addition salts are salts of an amino group formed with inorganic acids 
such as hydrochloric acid, hydrobromic acid, phosphoric acid, sulfuric acid and 
perchloric acid or with organic acids such as acetic acid, oxalic acid, maleic acid, 
tartaric acid, citric acid, succinic acid, or malonic acid or by using other methods used 
in the art such as ion exchange. Other pharmaceutically acceptable salts include 

15 adipate, alginate, ascorbate, aspartate, benzenesulfonate,' benzoate, bisulfate, borate, 
butyrate, camphorate, camphorsulfonate, citrate, cyclopentanepropionate, digluconate, 
dodecylsulfate, ethanesulfonate, formate, fumarate, glucoheptonate, glycerophosphate, 
gluconate, hemisulfate, heptanoate, hexanoate, hydroiodide, 2-hydroxy- 
ethanesulfonate, lactobionate, lactate, laurate, lauryl sulfate, malate, maleate, 

20 malonate, methanesulfonate, 2-napbthalenesulfonate, nicotinate, nitrate, oleate, 
oxalate, palmitate, pamoate, pectinate, persulfate, 3-phenylpropionate, phosphate, 
picrate, pivalate, propionate, stearate, succinate, sulfate, tartrate, thiocyanate, p- 
toluenesulfonate, undecanoate, valerate salts, and the like. Representative alkali or 
alkaline earth metal salts include sodium, lithium, potassium, calcium, magnesium, 

25 and the like. Further pharmaceutically acceptable salts include, when appropriate, 
nontoxic ammonium, quaternary ammonium, and amine cations formed using 
counterions such as halide, hydroxide, carboxylate, sulfate, phosphate, nitrate, lower 
alkyl sulfonate and aryl sulfonate. 

Additionally, as used herein, the term "pharmaceutically acceptable ester" 

30 refers to esters that hydrolyze in vivo and include those that break down readily in the 
human body to leave the parent compound or a salt thereof. Suitable ester groups 
include, for example, those derived from pharmaceutically acceptable aliphatic 
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carboxylic acids, particularly alkanoic, alkenoic, cycloalkanoic and alkanedioic acids, 
in which each alkyl or alkenyl moiety advantageously has not more than 6 carbon 
atoms. Examples of particular suitable esters includes formates, acetates, propionates, 
butyrates, acrylates and ethylsuccinates. 
5 Furthermore, the term "pharmaceutically acceptable prodrugs" as used herein 

refers to those prodrugs of the compounds of the present invention that are, within the 
scope of sound medical judgment, suitable for use in contact with the tissues of 
humans and lower animals without undue toxicity, irritation, allergic response, and 
the like, commensurate with a reasonable benefit/risk ratio, and effective for their 

10 intended use, as well as the zwitterionic forms, where possible, of the compounds of 
the invention. The term "prodrug" refers to compounds that are rapidly transformed in 
vivo to yield a particular active compound, for example by hydrolysis in blood. A 
thorough discussion is provided in T. Higuchi and V. Stella, "Pro-drugs as Novel 
Delivery Systems", Vol. 14 of the A.C.S. Symposium Series, and in Edward B. 

1 5 Roche, ed., Bioreversible Carriers in Drug Design, American Pharmaceutical 
Association and Pergamon Press, 1987, both of which are incorporated herein by 
reference. 

As mentioned above, the pharmaceutical compositions of the present invention 
additionally comprise a pharmaceutically acceptable carrier, which, as used herein, 

20 means a non-toxic, inert solid, semi-solid or liquid filler, diluent, encapsulating 

material, or formulation auxiliary of any type. Some examples of materials which can 
serve as pharmaceutically acceptable carriers are sugars such as lactose, glucose and 
sucrose; starches such as corn starch and potato starch; cellulose and its derivatives 
such as sodium carboxymethyl cellulose, ethyl cellulose and cellulose acetate; 

25 powdered tragacanth; malt; gelatin; talc; excipients such as cocoa butter and 

suppository waxes; oils such as peanut oil, cottonseed oil; safflower oil; sesame oil; 
olive oil; corn oil and soybean oil; glycols; such a propylene glycol; esters such as 
ethyl oleate and ethyl laurate; agar; buffering agents such as magnesium hydroxide 
and aluminum hydroxide; alginic acid; water; isotonic saline; Ringer's solution; ethyl 

30 alcohol, and phosphate buffer solutions, dextrose solutions, as well as other non-toxic 
compatible lubricants such as sodium lauryl sulfate and magnesium stearate, as well 
as coloring agents, releasing agents, coating agents, sweetening, flavoring and 
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perfuming agents, preservatives and antioxidants can also be present in the 
composition, according to the judgment of the formulator. 
, Liquid dosage forms for oral administration include pharmaceutically 

acceptable emulsions, microemulsions, solutions, suspensions, syrups and elixirs. In 
5 addition to the active compounds, the liquid dosage forms may contain inert diluents 
commonly used in the art such as, for example, water or other solvents, solubilizing 
agents and emulsifiers such as ethyl alcohol, isopropyl alcohol, ethyl carbonate, ethyl 
acetate, benzyl alcohol, benzyl benzoate, propylene glycol, 1,3-butylene glycol, 
dimethylformamide, oils (in particular, cottonseed, groundnut, corn, germ, olive, 

10 castor, and sesame oils), glycerol, tetrahydrofurfiiryl alcohol, polyethylene glycols 
and fatty acid esters of sorbitan, and mixtures thereof. Besides inert diluents, the oral 
compositions can also include adjuvants such as wetting agents, emulsifying and 
suspending agents, sweetening, flavoring, and perfuming agents. 

Injectable preparations, for example, sterile injectable aqueous or oleaginous 

1 5 suspensions may be formulated according to the known art using suitable dispersing 
or wetting agents and suspending agents. The sterile injectable preparation may also 
be a sterile injectable solution, suspension or emulsion in a nontoxic parenterally 
acceptable diluent or solvent, for example, as a solution in 1,3-butanediol. Among the 
acceptable vehicles and solvents that may be employed are water, Ringer's solution, 

20 U.S.P. and isotonic sodium chloride solution. In addition, sterile, fixed oils are 

conventionally employed as a solvent or suspending medium. For this purpose any 
bland fixed oil can be employed including synthetic mono- or diglycerides. In 
addition, fatty acids such as oleic acid are used in the preparation of injectables. 
The injectable formulations can be sterilized, for example, by filtration 

25 through a bacterial-retaining filter, or by incorporating sterilizing agents in the form of 
sterile solid compositions which can be dissolved or dispersed in sterile water or other 
sterile injectable medium prior to use. 

In order to prolong the effect of a drug, it is often desirable to slow the 
absorption of the drug from subcutaneous or intramuscular injection. This may be 

30 accomplished by the use of a liquid suspension of crystalline or amorphous material 
with poor water solubility. The rate of absorption of the drug then depends upon its 
rate of dissolution which, in turn, may depend upon crystal size and crystalline form. 
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Alternatively, delayed absorption of a parenterally administered drug form is 
accomplished by dissolving or suspending the drug in an oil vehicle. Injectable depot 
forms are made by forming microencapsulated matrices of the drug in biodegradable 
polymers such as polylactide-polyglycolide. Depending upon the ratio of drug to 
5 polymer and the nature of the particular polymer employed, the rate of drug release 
can be controlled. Examples of other biodegradable polymers include 
poly(orthoesters) and poly(anhydrides). Depot injectable formulations are also 
prepared by entrapping the drug in liposomes or microemulsions which are 
compatible with body tissues. 

10 Compositions for rectal or vaginal administration are preferably suppositories 

which can be prepared by mixing the compounds of this invention with suitable non- 
irritating excipients or carriers such as cocoa butter, polyethylene glycol or a 
suppository wax which are solid at ambient temperature but liquid at body 
temperature and therefore melt in the rectum or vaginal cavity and release the active 

15 compound. , 

Solid dosage forms for oral administration include capsules, tablets, pills, 
powders, and granules. In such solid dosage forms, the active compound is mixed 
with at least one inert, pharmaceutical^ acceptable excipient or carrier such as sodium 
citrate or dicalcium phosphate and/or a) fillers or extenders such as starches, lactose, 

20 sucrose, glucose, mannitol, and silicic acid, b) binders such as, for example, 
carboxyme^hylcellulose, alginates, gelatin, polyvinylpyrrolidinone, sucrose, and 
acacia, c) humectants such as glycerol, d) disintegrating agents such as agar-agar, 
calcium carbonate, potato or tapioca starch, alginic acid, certain silicates, and sodium 
carbonate, e) solution retarding agents such as paraffin, f) absorption accelerators such 

25 as quaternary ammonium compounds, g) wetting agents such as, for example, cetyl 
alcohol and glycerol monostearate, h) absorbents such as kaolin and bentonite clay, 
and i) lubricants such as talc, calcium stearate, magnesium stearate, solid polyethylene 
glycols, sodium lauryl sulfate, and mixtures thereof. In the case of capsules, tablets 
and pills, the dosage form may also comprise buffering agents. 

30 Solid compositions of a similar type may also be employed as fillers in soft 

and hard-filled gelatin capsules using such excipients as lactose or milk sugar as well 
as high molecular weight polyethylene glycols and the like. The solid dosage forms of 
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tablets, dragees, capsules, pills, and granules can be prepared with coatings and shells 
such as enteric coatings and other coatings well known in the pharmaceutical 
, formulating art. They may optionally contain opacifying agents and can also be of a 

composition that they release the active ingredient(s) only, or preferentially, in a 
5 certain part of the intestinal tract, optionally, in a delayed manner. Examples of 
embedding compositions that can be used include polymeric substances and waxes. 
Solid compositions of a similar type may also be employed as fillers in soft and hard- 
filled gelatin capsules using such excipients as lactose or milk sugar as well as high 
molecular weight polethylene glycols and the like. 

10 The active compounds can also be in micro-encapsulated form with one or 

more excipients as noted above. The solid dosage forms of tablets, dragees, capsules, 
pills, and granules can be prepared with coatings and shells such as enteric coatings, 
release controlling coatings, and other coatings well known in the pharmaceutical 
formulating art. In such solid dosage forms the active compound may be admixed 

15 with at least one inert diluent such as sucrose, lactose or starch. Such dosage forms 
may also comprise, as is normal practice, additional substances other than inert 
diluents, e.g., tableting lubricants and other tableting aids sucli a magnesium stearate 
and microcrystalline cellulose. In the case of capsules, tablets and pills, the dosage 
forms may also comprise buffering agents. They may optionally contain opacifying 

20 agents and can also be of a composition that they release the active ingredient(s) only, 
or preferentially, in a certain part of the intestinal tract, optionally, in a delayed 
manner. Examples of embedding compositions that can be used include polymeric 
substances and waxes. 

Dosage forms for topical or transdermal administration of a compound of this 

25 invention include ointments, pastes, creams, lotions, gels, powders, solutions, sprays, 
inhalants or patches. The active component is admixed under sterile conditions with a 
pharmaceutically acceptable carrier and any needed preservatives or buffers as may be 
required. Ophthalmic formulation and ear drops are also contemplated as being within 
the scope of this invention. The ointments, pastes, creams and gels may contain, in 

30 addition to an active compound of this invention, excipients such as animal and 
vegetable fats, oils, waxes, paraffins, starch, tragacanth, cellulose derivatives, 
polyethylene glycols, silicones, bentonites, silicic acid, talc and zinc oxide, or 
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mixtures thereof. Powders and sprays can contain, in addition to the compounds of 
this invention, excipients such as lactose, talc, silicic acid, aluminum hydroxide, 
calcium silicates and polyamide powder, or mixtures of these substances. Sprays can 
additionally contain propellants known in the art such as chlorofluorohydrocarbons. 
5 Transdermal patches have the added advantage of providing controlled 

delivery Of a compound to the body. Such dosage forms can be made by dissolving or 
dispensing the compound in the proper medium. Absorption enhancers can also be 
used to increase the flux of the compound across the skin. The rate can be controlled 
by either providing a rate controlling membrane or by dispersing the compound in a 

1 0 polymer matrix or gel. 

In yet another aspect, the present invention also provides a pharmaceutical 
pack or kit comprising one or more containers filled with one or more of the 
ingredients of the pharmaceutical compositions of the invention, and in certain 
embodiments, includes an additional approved therapeutic agent for use as a 

15 combination therapy. Optionally associated with such containers) can be a notice in 
the form prescribed by a governmental agency regulating the manufacture, use or sale 
of pharmaceutical products, which notice reflects approval by the agency of 
manufacture, use or sale for human administration. Instructions for use of the 
compound(s) may also be included. 

20 According to the methods of treatment of the present invention, cancer, 

particularly breast cancer, is treated or prevented in a patient such as a human or other 
mammal by administering to the patient a therapeutically effective amount of a 
compound of the invention, in such amounts and for such time as is necessary to 
achieve the desired result. By a "therapeutically effective amount" of a compound of 

25 the invention is meant a sufficient amount of the compound to treat (e.g. to ameliorate 
the symptoms of, delay progression of, prevent recurrence of, cure, etc.) cancer, 
particularly breast cancer, at a reasonable benefit/risk ratio, which involves a 
balancing of the efficacy and toxicity of the compound. In general, therapeutic 
efficacy and toxicity may be determined by standard pharmacological procedures in 

30 cell cultures or with experimental animals, e.g., by calculating the ED^ (the dose that 
is therapeutically effective in 50% of the treated subjects) and the LD 50 (the dose that 
is lethal to 50% of treated subjects). The ED 50 /LD 50 represents the therapeutic index 
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of the compound. Although in general drugs having a large therapeutic index are 
preferred, as is well known in the art, a smaller therapeutic index may be acceptable in 
, the case of a serious disease, particularly in the absence of alternative therapeutic 
options. Ultimate selection of an appropriate range of doses for administration to 
5 humans is determined in the course of clinical trials. 

It will be understood that the total daily usage of the compounds and 
compositions of the present invention for any given patient will be decided by the 
attending physician within the scope of sound medical judgment. The specific 
therapeutically effective dose level for any particular patient will depend upon a 
10 variety of factors including the disorder being treated and the severity of the disorder; 
' the activity of the specific compound employed; the specific composition employed; 
the age, body weight, general health, sex and diet of the patient; the time of 
administration, route of administration, and rate of excretion of the specific compound 
employed; the duration of the treatment; drugs used in combination or coincidental 
15 with the specific compound employed; and like factors well known in the medical 
arts. 

The total daily dose of the compounds of this invention administered to a 
human or other mammal in single or in divided doses can be in amounts, for example, 
from 0.01 to 50 mg/kg body weight or more usually from 0.1 to 25 mg/kg body 
20 weight. Single dose compositions may contain such amounts or submultiples thereof 
to make up the daily dose. In general, treatment regimens according to the present 
invention comprise administration to a patient in need of such treatment from about 
0.1 ^g to about 2000 mg of the compound(s) of the invention per day in single or 
multiple doses. 

25 



62 



WO 02/08765 PCT/USO 1/23843 



EXAMPLES 

Note: A numbered list of references appears following the Examples, all of which are 
, incorporated herein by reference. 

5 Example 1 

Preparation of Microarrays Containing 8498 Human cDNAs 

The human cDNA clones used in this study were obtained from Research Genetics 
(Huntsville AB, USA) as bacterial colonies in 96-well microtiter plates. The clones 

1 0 were chosen from a set of 1 5,000 cDNA clones that corresponded to the Research 
Genetics Human Gene Filters sets GF200-202 (http://www.resgen.com/) . These 
clones form part of a set of clones assembled by the I.M.A.G.E. consortium (Lennon, 
G.G., Auffray, C, Polymeropoulos, M., Soares, M.B. The I.M.A.G.E. Consortium: 
An Integrated Molecular Analysis of Genomes and their Expression. Genomics 

15 33:151-152,1996) and are identified by I.MA.G.E. clone ID numbers. All clones 
printed on these arrays were sequence validated as part of a product offered at 
Research Genetics, Inc. We estimate that greater than 97% of the clones on the array 
are correctly identified. 

A detailed protocol for the production of the cDNA microarrays used in this 

20 study is available at http://cmgm.stanford.edu/pbrown/protocols.html and is 

reproduced below with insubstantial changes. As described below, the protocol 
includes steps of (1) cleaning the glass slides onto which the DNAs (e.g., products of 
PCR reactions) are to be spotted; (2) spotting the DNAs onto the glass slides with an 
arrayer; (3) Post processing to prepare arrays containing spotted DNAs for 

25 hybridization. All procedures are done at room temperature and with double distilled 
water unless otherwise stated. Unless otherwise stated, in this Example and the 
following Examples, reagents are prepared according to protocols available in 
Maniatis, T., Sambrook, J. and Fritsch, E., Molecular Cloning: A Laboratory Manual 
(3 Volume Set), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 1989. 



30 



Cleaning Slides 

Use 30 slide racks in 350mL glass dishes 
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1 . Dissolve 50g of NaOH pellets into 1 50ml ddffiO 

2. Add 200ml of 95% EtOH, stir until completely mixed 

3. If solution remains cloudy, add ddH20 until clear 

4. Pour solution, into glass slide box. 

5 5. Drop in 30 slides in a metal rack. (Gold Seal slides, Cat. 3010) 

6. Let soak on an orbital shaker for at least two hours 

7. Rinse slides by transferring rack to slide dish filled with ddH20 

8. Repeat ddH20 rinses x3. It's important to remove all traces of the 
NaOH-ethanol. . 

10 9. Prepare Poly-l-lysine solution: Use Sigma Poly-Mysine solution. Cat. No. 

8920 

10. Add 70mL poly-l-lysine to 280ml of water 

1 1 . Transfer slides to poly-l-lysine solution and let soak for 1 hour. 

12. Remove excess liquid from slides by spinning the rack of slides on microtiter 
15 plate ( » 

carriers at 500rpm. 

13. Dry slides at 40 degrees C for 5 minutes in a vacuum oven. 

14. Store slides in a closed box for at least two weeks prior to use. 

15. Before printing arrays, check a sample slide to make sure it's hydrophobic 
20 (water should bead off it) but the lysine coating is not turning opaque. 

Arraying 

1. Transfer PCR reactions to 96-well V-bottom tissue culture plates (Costar). 
Add 1/10 vol. 3M sodium acetate (pH 5.2) and equal volume isopropanol. Store at -20 

25 C for a few hours. 

2. Centrifuge in Sorvall at 3500 RPM for 45 min. Rinse with 70% EtOH, 
centrifuge again and dry. 

2. Resuspend DNA in 12ul 3X SSC for a few hours and transfer to flexible 
U-bottom printing plates. 
30 4. Spot DNA onto poly-l-lysine slides with an arrayer. 
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Post processing 

1. Rehydrate arrays by suspending slides over a dish of warm ddH20. (~1 
minute) 

2. Snap-dry each array (DNA side up) on a 100C hot plate for 3 seconds. 

3. UV cross-link DNA to the glass by using a Stratalinker set for 60 milliJoules 

4. Dissolve 5g of succinic anhydride (Aldrich) in 315mL of 
n-methyl-pyrrolidinone. 

5.. To this, add 35mL of 0.2M NaBorate pH 8.0 (made by dissolving boric acid in 
water and adjusting the pH with NaOH), and stir until dissolved. 

6. Soak arrays in this solution for 15 minutes with shaking. 

7. Transfer arrays to 95C water bath for 2 minutes 

8. Quickly transfer arrays to 95% EtOH for 1 minute. 

9. Remove excess liquid from slides by spinning the rack of slides on microtiter 

plate 

carriers at 500rpm. 

1 0. Arrays can be used immediately. 

Reagent Suppliers 

Microscope slides Goldseal brand. (Cat. 3010) 
Poly-l-lysine solution Sigma product number P8920 
Succinic Anhydride Aldrich product number 23,969-0 
N~MethyI-Pyrrolidinone Aldrich product number 32,863-4 

Microarrays were prepared according to the above protocol using the 8498 
cDNA clones described above. All microarrays used in the experiments described 
herein were from a single print run batch of microarrays. 



Cell Lines, Breast Tissue, and Breast Tumor Samples for Microarray Analysis and 
Preparation of mRNA Samples 
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Common Reference Sample 

Each of the 84 experimental samples tested here was analyzed by a 
comparative hybridization, using a common reference RNA pool as a standard; this 
5 reference sample was composed of equal mixtures of mRNA isolated from 1 1 

established cell lines derived from human tissue (MCF7, Hs578T, OVCAR3, HepG2, 
NTERA2, MOLT4, RPMI-8226, NB4+ATRA, UACC-62, S W872, and Cok>205: also 
see Table 3 for more details). The 1 1 cell lines were all grown to 70-90% confluence 
in RPM medium, containing 10% Fetal Calf Serum and Penicillin/Streptomycin. The 

1 0 cells were harvested either by scraping or centrifugation, quickly resuspended in RNA 
lysis buffer and mRNA prepared using the FastTrack™ 2.0 mRNA Isolation Kit 
(Invitrogen, Carlsbad, CA) according to the manufacturer's instructions. In each case, 
multiple individual mRNA preparations were collected for each cell line, which were 
then pooled together and analyzed via Northern analysis before final mixing to ensure 

1 5 the quality of the input mRNAs (e.g., to confirm that the mRNA exhibited a size 

distribution indicating that it was substantially nondegraded). The 1 1 mRNA samples 
were then mixed together in equal amounts, aliquoted in lOmM Tris (7.4), and stored 
at -80 C until use (2 micrograms of common reference sample was used per 
microarray hybridization and was always labeled using Cy3). 

20 

Normal Breast Tissue 

, , 

Three samples of normal breast tissue were analyzed. Two of the samples 
were obtained from Clontech (Palo Alto, CA) and were pools of six (Normall) or two 
(Normal2) whole normal breasts. The third sample (NormaB) was obtained from a 
25 single individual. 

Breast Tumor Samples 

The 40 individual breast tumor samples were collected at either Stanford 
University in Stanford CA, USA, or in the Haukeland University Hospital in Bergen, 
30 Norway. Twenty of the forty breast tumors were sampled twice as part of a larger 
Norwegian study on locally advanced breast cancers (T3/T4 and/or N2 tumors) and 
have been described previously (Aas, T., et al., Nat. Med., 2, 81 1-814, 1996, the 
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contents of which are incorporated herein by reference) ; these patients underwent an 
open surgical biopsy before treatment with doxorubicin monotherapy (range 12-23 

, weeks), followed by the definitive surgical resection of the remaining tumor after 
therapy, and were evaluated for clinical responses according to UICC criteria 

5 (Hayward, J., et al., Br. J. Cancer, 35, 292-298, 1977). In addition to the 20 pairs, 
there were 8 additional "before" specimens from Norway and 10 tumor specimens 
from Stanford (all Stanford tumors tested had a diameter of 3 cm or larger). Finally, 2 
of the 1 0 Stanford tumor specimens assayed were also paired with a lymph node 
" - metastasis from the same patient. 



mRNA Isolation from Breast Tumor and Tissue Samples 

Following their excision, breast tumor samples were rapidly frozen in liquid 
N2 and then stored at -80 C until use. mRNA was isolated from breast tumors and 
normal breast tissue using the Trizol Reagent (Gibco-BRL) and Invitrogen FastTrack 

15 2.0 Kit (all Stanford samples, and see http://genome- 

www.stanford.edu/sbcmp/web.shtml for the detailed protocol) or using the Trizol 
Reagent followed by Dynal bead separation for the mRNA purification step (all 
Norway tissue samples). Briefly, frozen tumor samples were cut into small pieces and 
immediately placed into 12 ml of Trizol Reagent. Each tumor sample in Trizol was 

20 homogenized using a PowerGen 1 25 Tissue Homogenizer (Fisher Scientific), and 
total RNA was isolated according to the Trizol reagent manufacturer's protocol. 
Tumor mRNA was isolated according to the manufacturer's protocols using the 
FastTrack 2.0 Kit (Invitrogen) or Dynal beads. 

25 Example 3 



For all but two of the tumor specimens (i.e. New York 1 and New York 2), the 
mutational status of the TP53 gene was determined using published methods (Aas, T,, 
30 etal). 

A single pathologist (applicant Matt van de Rijn) reviewed hematoxylin and 
eosin (H&E) sections of each tumor, including all before and after pairs, and made a 



10 



Characterization of Breast Tissue and Tumor Samples 
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histological evaluation of each while blinded to the source. Tumors were graded using 
a modified version of the Bloom-Richardson method (Robbins, P., et al, Hum Pathol, 
26, 873-879, 1995). These data are displayed in Table 4. Representative H&E 
sections of each tumor are posted on Applicants' website at http://genome- 
5 www.stanford.edu/molecularportraits/. 

Imhaunohistochemistry was performed as described previously (Perou, C, et 
aL, 1999; Bindl, J. and Warnke, R., Am J Clin Pathol, 85, 490-493, 1986, and 
Natkunam, Y., et al, Am. J. Path., 156(1), 2000). The antibodies used included the 
commercially available monoclonal antibodies CAM5 .2 (specific for keratins 8/1 8, 

10 available from Becton Dickinson), anti-keratin 5/6 (available originally from 

Boehringer Mannheim, Indianapolis, IN, cat, no. 1273396 and now from Chemicon 
International, Temekula, CA ), anti-keratin 1 7 (clone E3, available from Dako, . 
Carpinteria, CA, cat no. M7046), anti-CD3 (available from Dako), and anti- 
immunoglobulin light chain (A191, A193, available from Dako). These 

15 immunohistochemical methods were applied for all the immunohistochemical studies 
described in the present application unless otherwise stated. Results are presented in 
Figure 3 and are described in further examples as appropriate. 

20 Example 4 

cDNA Synthesis and Labeling and Microarray Hybridization 

mRNA was isolated from breast tissue, breast tumor samples, and cell lines as 
described in Example 2. Fluorescently labeled cDNA was synthesized from the 

25 mRNA using a reverse transcriptase reaction that included dUTP labeled with either 
Cy3 or Cy5. For each hybridization experiment differentially labeled cDNA samples 
(an experimental sample and a reference sample) were pooled and hybridized to a 
cDNA microarray, which was then scanned as described in Example 4. The protocol 
below provides details of the steps performed for cDNA synthesis and labeling and 

30 for microarray hybridization. 
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1. To set up for the reverse transcriptase (RT) reaction, combine the following (e.g., in 
an Eppendorf tube): 

, (a) Anchored Oligo dT primer - 2 microliters at 2.5 micrograms/microliter or 
control - 2 microliters. 

5 

(b) mRNA - (whatever volume is needed to reach 1.5-2 micrograms) 

(c) DEPC/H20 - add sufficient volume so that final volume is 16 microliters 

2. Heat at 70° C for 10 minutes 

3. Chill on ice for 1-2 minutes 

10 4. Add the following RT reaction components to each individual tube: 

(a) 5X RT Buffer - 6 microliters 

(b) 50X dNTPs - 0.7 microliters - (500mm A,C,G, 200mm T) 

(c) Cy Dyes dUTP - 3 microliters - (either Cy3 or Cy5) 

(d) DTT Stock - 3 microliters - (comes with RT setup) 

15 (e) Superscript H RT-1 .7 microliters - (cattf 1 8064-014 Gibco-BRL) 

5. Mix well 

6. Incubate at 42° C for 1 hour 

7. Add another 1 microliter of Superscript II RT and mix 

8. Incubate at 42° C for 1 more hour 

20 9. Degrade mRNA with 1 .5 microliters of 1M NaOH / 2mM EDTA 

10. Incubate at 65° C for 8 minutes (do NOT go TOO long here) 

11. Add 15 microliters of 0.1M HCL 

12. Add 450 microliters of TE (pH 7.4) to each sample and place each sample into a 
microcon-30 filter. 

25 13. Add 15 microliters of Human COT1 DNA (Gibco-BRL = 1 microgram/microliter) 
to each sample in the microcon filter. 

14. Spin in Eppendorf centrifuge until volume equals about 50 microliters (8-10 1 ) 

15. Remove flowthroughs, and pool Cy3 and Cy5 flowthroughs together for future 
recovery of Cy dyes (store at -20 ° C). 

30 16. Invert microcons, recover labeled samples, and pool Cy3 and Cy5 samples 

together that will be used for an individual experiment, in a single microcon filter that 
was used in step 15. 
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17. Add 500 microliters of T.E again, and spin until final volume equals 8 microliters 
or less (BE VERY CAREFUL TO NOT SPIN THE SAMPLE DRY! ! !) 
, 1 8. To the 8 microliter combined Cy3 + Cy5 sample, add the following: 

5 (a) Yeast tRNA - 1 microliter - (10 micrograms/microliter) 

(b) PolyA DNA - 2 microliters - (10 micrograms/microliter) 

(c) 20XSSC - 2 microliters - (FINAL SSC concentration approximately 3X) 

(d) 10% SDS - 0.3 microliters 

FINAL VOLUME = 13.3 MICROLITERS 
10 19. Mix well. 

20. Heat sample at 100° C for 2 minutes, spin very briefly. 

21. Place samples at 42° C for 20-30 minutes. 

22. During Step 21, prepare the necessary number of hybridization chambers (Custom 
made by Die-Tech, San Jose, CA (see "Drawings for custom parts at 

1 5 http://cmgm.stanford.edu/pbrown/mguide/HybChamber.pdf') or purchased at Corning 
Costar, Acton, MA (CTM™ Hybridization Chamber, #2551), get 22mm X 22mm 
coverslips ready, and get arrays ready. 

23. Add the 13 microliters of probe (i.e., labeled cDNA mixture) onto the center of the 
array while NOT actually touching the array face with the pipette tip. 

20 24. Quickly and gently place the 22mm X 22mm glass#l coverslip onto the array 
face. 

25. Add about 15-20 microliters of 3XSSC in two drops onto the end of the array slide 
away from the actual array for hydration purposes. 

26. Assemble the hybridization chamber with the array slide in it, and place into a 65 
25 C water bath overnight. 

27. Pull out the hybridization chamber and dry off the excess H 2 0. 

28. Disassemble the hybridization chamber, and quickly place the slides into a slide 
washing chamber that contains 2XSSC/0.05%SDS. Jiggle the slide holder up and 
down until the slide coverslip falls off. Repeat this individually for each array, one at 

30 a time, until all are done 

29. Wash slides in 1XSSC for 3-5 minutes. 

30. Wash slides in 50 C 0.2XSSC for 3-5 minutes, twice. 
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31. Spin slides down in centrifuge at 200 RPM for 2 minutes. 
32.SCAN immediately. 

Example 5 

5 Collection, Processing, and Analysis of Data from Microarray Hybridizations 

The cDNA microarrays were scanned with either a General Scanning 
(Watertown, MA) ScanArray 3000 at 20 microns resolution, or with a prototype Axon 
Instruments (Foster City, CA) GenePix Scanner at 10 micron resolution. The output 

10 files, which were TIFF images, were then analyzed using the program ScanAlyze (M. 
Eisen; available at http://www.microarrays.org/software) . Fluorescent ratios and > 
quantitative data on spot quality (see ScanAlyze manual) were stored in a prototype of 
'the AMAD database (M. Eisen; available at http://www.microarravs.org/software) . 
Areas of the array with obvious blemishes were manually flagged and excluded from 

15 subsequent analyses. The primary data tables can be; downloaded at http://genome- 
www.stanford.edu/moleculaiportraits/, in text/tab delimited format after obtaining a 
password. 

Data were extracted from the database in a single table, with each row 
representing an array element, each column a hybridization, and each cell the 

20 observed fluorescent ratio for the array element in the appropriate hybridization. 
Previously flagged spots were excluded, as were spots that did not pass quality 
control. This table had 9216 rows and 84 columns. Array elements were removed if 
they were not well measured in at least 80% of the hybridizations. The data table was 
split into tumors and cell lines, and the two subtables were separately median polished 

25 (the rows and columns were iteratively adjusted to have median 0) before being 

rejoined into a single table. Genes whose expression varied by at least 4-fold from the 
median in this sample set in at least three of the samples tested were selected for the 
analyses described in the Detailed Description and in Examples 6 and 7 (1753 genes 
satisfied these conditions). 

30 Average-linkage hierarchical clustering, as implemented in the program 

Cluster (M. Eisen; http://www.microarrays.org/software) , was applied separately to 
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both the genes and arrays. The results were analyzed, and images generated, using 
TreeView (M. Eisen; http://www.microarrays.org/software) . 

5 Example 6 

Molecular Portraits of Tumors Based on Variation in Expression of 1753 Genes 
Methods 

A hierarchical clustering method (Eisen, 1998) was used to group 1753 
differentially expressed genes (i.e., those genes wtoose expression varied by at least 4- 

10 fold from the median in the sample set in at least three of the samples tested) based on 
similarity in the pattern with which their expression varied over all samples. The 
same clustering method was used to group the experimental samples (tissues and cell 
lines separately) based on the similarity in their patterns of expression. The data are 
conveniently presented in a matrix format, with each row representing a single gene, 

15 and each column representing an experimental sample. The ratio of the abundance of 
transcripts of each gene, in each sample, to the median abundance of the gene's 
transcript among all the cell lines (left panel) or to its median abundance across all the 
clinical samples (right panel) is represented by the color of the corresponding cell in 
the matrix. Green squares represent transcript levels below the median; black squares 

20 represent transcript levels equal to the median; red squares represent transcript levels 
greater than the median; gray squares indicate technically inadequate or missing data. 
The color saturation reflects the majgnitude of the ratio relative to the median for each 
set of samples (see scale at bottom left). In all images the brightest red color 
represents transcript levels at least 16-fold greater than the median, and the brightest 

25 green color represents transcript levels at least 16-fold below the median. 

Results 

(i) Molecular Portraits of Tumors 

Three striking general features of the tumors' gene expression patterns are 
30 evident in Appendices A and D. First, the breast tumors show remarkable variation in 
their patterns of gene expression. Second, this variation is multidimensional, that is, 
many different sets of genes show largely independent patterns of variation. Third, 
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the patterns of gene expression have, a pervasive order reflecting relationships among 
the genes, relationships among the tumors, and connections between specific genes 
and specific tumors. 

The hierarchical clustering algorithm organized the experimental samples 
5 based only, on overall similarity in their gene expression patterns; relationships among 
the experimental samples are summarized in a dendrogram , in which the pattern and 
length of the branches reflect the relatedness of the samples (Eisen, M., et ai 9 1998). 
Fifteen of the 20 pairs of samples taken from the same tumor before and after 
doxorubicin chemotherapy (red dendrogram branches), and both pairs of samples 

10 taken from a primary tumor and an associated lymph node metastasis (blue branches) 
were clustered together on adjacent terminal branches in the dendrogram. The three 
clustered normal breast samples are highlighted in green. The branches representing 
the four breast luminal epithelial cell lines are displayed in pink; breast basal 
epithelial cell lines are displayed in orange, the endothelial cell lines in blue, the 

15 mesynchemal-like cell lines in dark green, and the lymphocyte-derived cell lines in 
dark red. 

Application of the clustering method to the samples and genes identified the 
two members of each primary tumor/metastasis pair as being closely related to one 
another based on similarity in gene expression. Thus this method can provide 

20 information useful in determining whether a tumor sample obtained from a second 
tumor is a metastasis originating from a first tumor or is an independent primary 
tumor. In addition, despite the potential confounding effects of an interval of 16 
weeks, independent surgical procedures and cytotoxic chemotherapy, the independent 
samples taken from the same tumor before and after chemotherapy were in most cases 

25 recognizably more similar to each other in their overall pattern of gene expression 
than either was to any of the other samples. 

Closer examination of the five before and after pairs that were not matched by 
the clustering algorithm provided further insight. In three instances, the after 
chemotherapy specimens (i.e. Norway 47, 61, and 101) were clustered into a branch 

30 of the dendrogram that contained the three normal breast samples along with five 

additional tumor samples; we know from the clinical data that these three tumors were 
all classified as doxorubicin responders (Table 5 and Aas, T., et ah). Thus, in most 
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cases, independent tumor biopsies from the same individual could be recognized as 
such solely on the basis of gene expression patterns. This implies that the patterns of 

, gene expression are homogeneous and stable in each breast tumor, and yet, 

sufficiently diverse between tumors, so that they can be viewed as molecular portraits 

5 of each tumor. 

(ii) Specific Properties of the Tumors 

The molecular portraits revealed in the patterns of gene expression not only 
uncovered similarities and differences among the tumors but, in many cases, pointed 
10 to a biological interpretation. As discussed below, variation in growth rate, in the 
activity of specific signaling pathways, and in the cellular composition of the tumors 
were all reflected in corresponding variations in the expression of specific subsets of 
genes. 

15 Growth and Proliferation . The largest distinct subset of genes among the 1753 genes 
was the proliferation subset, which is a group of approximately 120 genes whose level 
of expression correlates with cellular proliferation rates (See Perou, C, et aL, 1999; 
Ross, D., et ah, Nature Genetics, 24(3): 227-35, 2000.). Expression of this subset of 
genes varied widely among the tumor samples, and was generally well correlated with 

20 a standard pathological index of tumor cell proliferation, namely the mitotic index. 
The mitotic grade of each tumor, as determined by evaluating mitotic index, is 
displayed in a color-coded format below the tumor name, with green indicating 
mitotic grade 1, black indicating mitotic grade 2, red indicating mitotic grade 3, and 
gray indicating that mitotic grade was not evaluated. The growth and proliferation 

25 cluster also included the genes encoding two widely used immunohistochemical 
markers of cell proliferation (Ki-67 and PCNA, names in blue/purple letters). 

Diverse proliferation-related functions are represented in the genes comprising 
this subset, including macromolecular synthesis, cell-cycle regulation, mitosis and 
cytokinesis. Many genes in which alterations in sequence or expression that are 

30 associated with tumorigenesis were also found in this gene subset, in particular, 

numerous genes implicated in chromosomal instability and/or anueploidy (names in 
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pink letters) 22 . These genes included the spindle checkpoint gene hBUBl 23 , the human 
MAD2 homologue 24 , the STK15/IPL1 kinase 25 , and the PLK1/HSTPK13 kinase 26 . 

, The importance of this clustered set of genes in cancer biology is further 

highlighted by its inclusion of genes encoding the molecular targets of widely used 

5 anticancer agents (names in orange letters), including both subunits of ribonucleotide 
reductase, topoisomerase II alpha, and dihydrofolate reductase. The many 
uncharacterized genes in this subset, therefore, are candidates for important roles in 
the regulation and execution of the cell's program for growth and proliferation, and 
potential targets for oncogenic mutations or antiproliferative drugs. Thus the 
10 clustering method, by generating a set of genes known to be involved in proliferation 
and/or known to be targets for antiproliferative drugs and further identifying a set of 
unknown genes whose expression patterns cause them to fall within the subset, 
identifies potential targets for the development of new chemotherapeutic agents. 



1 5 Variation in signaling pathways . Several groups of co-expressed genes provided 
views of the activities of specific signaling and/or regulatory systems, 
(a) Interferon signaling : A large subset of genes known to be regulated by the 
interferon pathway (including STAT1) showed substantial variation in expression 
among the tumors. 

20 (b) Estrogen receptor : Variation in expression of the estrogen receptor alpha gene 
(ESR1) correlated well with the direct clinical measurement of the estrogen receptor 
protein levels in the tumors (Table 5, concordance in 36/38 tumors tested), and 
paralleled variation in the expression of a larger group of genes that included three 
other transcription factors (GATA-binding protein 3, X-box binding protein 1 and 

25 hepatocyte nuclear factor 3 alpha (see also references 27 and 28). In a specific subset 
of the estrogen receptor positive tumors, the BCL2 gene and two previously known 
estrogen regulated genes (LIVI and trefoil factor l 29 ) were also highly expressed (See 
Appendices C and D). The regulatory program reflected in the expression of this 
2S£/y -containing subset of genes may play an important role in the clinical course of a 

30 breast tumor, as the loss of expression of the estrogen receptor is known to be 

associated with a poor prognosis 17 , while high levels of expression of both BCL2 and 
ESR1 are associated with a more favorable prognosis 30 ' 31 . 
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( c ) Erb-B2 : HER2/neu 9 also known as the Erb-B2 oncogene, is a gene whose aberrant 
expression is thought to contribute to tumorigenesis in the breast 16 . The Erb-B2 
receptor-tyrosine kinase is known to be overexpressed in 20-30% of all breast tumors, 
usually associated with DNA amplification of the chromosomal locus (17ql2-q22) 

5 that contains the ERB-B2 gene 32,33 . Interestingly, most of the other genes contained 
within the Erb-B2 cluster 

were also located in this same small region of Chromosome 17. These expression 
data suggested, and the results of microarray comparative genomic hybridization 
confirmed, that these other closely linked genes were also amplified on the genomic 
10 DNA level and, consequently, overexpressed on the mRNA level in tumors with an 
amplified Erb-B2 gene 33 " 35 . 

(d) Fos/Jun Signaling : A subset of genes that included c-Fos, JunB, and other genes 
involved in the "immediate-early" response to serum, co-varied in expression among 
the tumor specimens; these genes were most highly expressed in the three normal 

15 breast samples. Applicants have found that this set of genes is characteristically 
induced by prolonged handling of the samples following surgical resection. The 
observed variation in the expression of this set of genes may therefore reflect variation 
in post surgical handling rather than true in vivo differences. 



20 Example 7 

Identification of Cell Type Specific Components Within Tumors Based on Variation 

in Expression of 1753 Genes 

Methods and Rationale 

25 Clustering was performed as described in the previous Example. The resulting 

dendrogram and matrix were used to identify gene expression patterns indicative of 
the presence of certain cell types within the samples. Human breast tumors are 
histologically complex tissues, containing a variety of cell types in addition to the 
carcinoma cells 18 . In analyzing the gene expression patterns in tumors and tissues, 

30 two lines of reasoning were used to infer the lineage of the cells that accounted for 
apparently cell-type specific expression of particular clustered groups of genes. First, 
such gene subsets usually included genes whose expression patterns have been well 
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characterized by previous workers, and have consistently pointed to a specific cell 

type. Second, these inferences were often corroborated by observing comparable 
, expression of the same group of genes in one or more of the cultured cell lines 

(reference 21). Some of the prominent patterns of gene expression that appear, on this 
5 basis, to indicate the variable abundance of particular cell types in these tissue 

samples are summarized below. 

Immunohistochemistry was performed as described in Example 3. 

Results 

10 At least eight subsets of genes appeared to reflect variation in specific cell 

types present within the tumors. The notion that developmental lineage has a 
pervasive influence on gene expression patterns is highlighted by the clustering 
pattern of the cultured cell lines. For example, the three lymphocyte cell lines 
comprise one branch, the two endothelial cell lines constitute another and the 

15 mesenchymal cell lines form a third. Cell lines derived from two distinct types of 
breast epithelial cells (basal and luminal) also formed distinct dendrogram branches. 
Some of the prominent patterns of gene expression that appear to indicate the variable 
abundance of particular cell types within a tumor sample are summarized in the 
remainder of this Example. 

20 

(a) Endothelial cells : A subset of genes characteristically expressed by endothelial 
cells, including CD34, CD31 and von Willebrand Factor 36 - 37 were also strongly 
expressed in the two endothelial cell lines HUVEC and HMVEC. Variation among 
the tumor samples in the abundance of transcripts from this subset of genes may 

25 therefore reflect variation in the vascularity or angiogenic activity within the tumors. 

(b) Stromal cells : A previously characterized subset of genes that included multiple 
isoforms of collagen and other genes encoding extracellular matrix components, many 
of which are characteristically expressed by mesenchymal cells, showed significant 
variation in expression among the tumor samples 8,21 . 

30 (c) Adipose-Enriched/Normal Breast: A subset of genes that included fatty acid 

binding protein 4 and PPARy may represent the presence of adipose cells in the tumor 
samples 38,39 . This subset of genes was most highly expressed in the three normal breast 
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samples. As we have no cell line guide for this cluster, the exact nature of the cell type 
underlying expression of these genes cannot be unequivocally determined. 

(d) B-lymphocytes : Variation in expression of a subset of genes that were highly 
expressed in RPMI-8226 (a multiple myeloma-derived cell line), including many 

5 immunoglobulin genes, appears to represent variable B-cell infiltration of the tumors. 
This interpretation was corroborated by immunohistochemistry) 8 * 21 . 

(e) T-lymphocytes : One subset of co-expressed genes included CD3, and two subunits 
of the T-cell receptor. Most of the genes in this subset were expressed at their highest 
levels in the T-cell leukemia derived cell line, MOLT-4. Variation in expression of 

10 this subset of genes was therefore interpreted as representing variation in T- 
lymphocyte populations in the tumors. Immunohistochemical staining of tumor 
samples, using anti-CD3 antibodies, confirmed that tumors with the highest levels of 
expression of this subset of genes contained numerous CD3-positive lymphocytes 
(Figure 3b). 

.15 (f) Macrophages : A subset of genes that appeared to be markers of 

macrophage/monocyte populations included CD68, acid phosphatase 5, chitinase, and 
lysozyme. Interestingly, the transcripts for these genes were the most abundant in the 
three after chemotherapy tumor samples that clustered apart from their before 
counterparts (i.e. Norway 47, 61, and 101). These three tumors, all of which had 

20 responded to the chemotherapy, were thus notable not only for an overall gene 

expression pattern resembling that of normal breast tissue, but also, for a particularly 
large population of macrophages, perhaps representing a secondary response to tumor 
necrosis. 

(g) Basal and Luminal Epithelial Cells of the Mammary Duct, and Their Malignant 
25 Counterparts : Two distinct kinds of epithelial cells are found in the adult human 
mammary gland, basal (and/or myoepithelial cells) and luminal epithelial cells 18 ' 40 . 
These two cell types are conveniently distinguished immunohistochemically; basal 
epithelial cells can be stained with antibodies to keratin 5/6 (Figure 3c), while luminal 
epithelial cells stain with antibodies against keratins 8/18 (Figure 3c). Many genes 
30 were expressed by one of these two cell lines, but not by the other. The gene 

expression subsets characteristic of basal epithelial cells included several genes that 
have previously been shown to play important roles in this cell type, e.g., keratin 5, 
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keratin 17, integrin-D4 and laminin 18 . The gene expression subset characteristic of 
luminal cells was anchored by the previously noted subset of transcription factors that 
included the estrogen receptor gene. 



10 Methods and Rationale 

As described in Examples 6 and 7, analysis of genes that are differentially 
expressed in breast tumor samples provides an indication of the relatedness of the 
samples and allows identification of samples taken from the same tumor or members 
of a tumor/metastasis pair. Such analysis further provides insight into specific tumor 

15 properties such as variation in growth rate, activity of specific signaling pathways, and 
the cellular composition of the tumors. The subset of genes analyzed in Examples 6 
and 7 was selected solely based upon the fact that genes in the subset were 
differentially expressed among the experimental samples. Recognizing that the 
choice of genes whose expression levels provide the basis for the ordering of the 

20 tumor samples determines which phenotypic relationships among the tumors are 

reflected in the clustering patterns, applicants devised methods for selecting subsets of 
genes optimized to reflect phenotypic relationships among the tumors. 

(i) Selection of an intrinsic gene subset 

25 The rationale behind the first optimized gene subset was Applicants' 

recognition that specific features of a gene expression pattern that are to be used as the 
basis for classifying tumors should typifythat tumor; that is, these features should be 
similar in any sample taken from the same tumor, and they should vary among 
different tumors. The 22 pairs of independent samples taken from 22 different tumors 

30 provided an opportunity for the selection of genes that fulfill these criteria. To select 
a set of genes whose variation in expression optimally represented differences 
between tumors rather than just differences between tumor samples, a "within- 
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Example 8 



Classification of Breast Tumors Using an Optimized Set of Genes Showing 
Differential Expression Between Tumors 
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between" score was assigned to each gene equal to the mean effect of the gene on the 
pairwise correlation coefficients of the 22 matched tumor pairs less the mean effect of 

, the gene on the remaining 210 tumor-tumor pairwise correlation coefficients. The 
"effect" of a gene on a pairwise correlation was defined as the difference in the 

5 correlation coefficient with and without data for the gene included. Higher "within- 
between" scores indicated that the gene had a good tendency to group together paired 
samples. 

The 496 genes with a score one standard deviation above the mean score were 
selected and defined as the "intrinsic" gene subset. To confirm the existence 1 of an 

10 "intrinsic" set of genes and to verify that the "within-between" score identified these 
genes, the predictive quality of the score was examined using a type of "leave-one- 
out" cross-validation analysis. The entire analysis was repeated 22 times, each with 
one of the 22 matched pairs completely removed from the analysis. If an "intrinsic" 
set of genes existed, and if the tc within-between" score successfully identified these 

15 genes, it was expected that the genes with high scores in each reduced dataset would 
produce relatively high correlations in the excluded pair. When the genes were sorted 
based on their "within-between" score in each reduced dataset, the correlation 
coefficient of the excluded matched pair in sliding windows of 250 genes increased 
progressively with increasing "within-between" score for nearly all of the matched 

20 pairs, while no such increase was found when randomly matched pairs were used. 

The clustering method was used as described above to cluster the experimental 
samples based on the gene expression patterns of the 496 genes included in the 
"intrinsic" gene subset. 

25 (ii) Selection of an "epithelial-enriched" gene subset 

A second optimized gene subset (called the "epithelial-enriched" gene subset) 
was selected consisting of 374 genes that Applicants considered likely to be expressed 
primarily by normal or malignant breast epithelial cells. The rationale for this gene 
subset is that each of the tumors was ultimately caused by alterations in breast 
30 epithelial cells. The seven individual subsets of genes that were chosen to form the 
- "epithelial-enriched" gene subset were selected from the 1753 gene cluster diagram. 
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The actual groups of genes chosen are listed in Table 7. These seven subsets of genes 
included: 

1) A subset that was very highly expressed in the cultured basal cell lines, along with 
some of the other breast derived cell lines including Hs578T and BT-549; 
5 2) A subset that was expressed in all of the cultured epithelial cell lines (both basal 
and luminal); 

3) A subset of genes centered around the high level of expression of Erb-B2\ 

4) A subset of genes that contained genes known to be important for tumor biology 
(e.g., the urokinase receptor); 

10 5) A subset that contained genes that were most highly expressed in the basal-like 
tumors; 

6) A subset of genes highly expressed in some of the luminal-like tumors; 

7) A subset of genes that was primarily expressed in the four breast carcinoma derived 
cell lines and/or in many of the luminal-like tumors. 

15 • ( . 

The clustering method was used as described above to cluster the experimental 
samples based on the gene expression patterns of the 374 genes included in the 
"epithelial-enriched" gene set. 

To confirm the results of the clustering analysis described below, a "weighted 
20 voting"method was applied to the data as described in Golub, T.R., et ah, Science, 
286,531-537,1999. 

Results 

The 496 genes included in the "intrinsic" gene set are identified in Table 6. 

25 Two large branches were apparent in the tumor dendrogram that resulted from 
analysis based on this gene set, and within each of these two branches, smaller 
branches were identified for which common biological themes could be inferred. The 
branches are colored accordingly (basal-like = ORANGE, Erb-B2 positive = PINK, 
normal breast-like = GREEN, and luminal epithelial-like = BLUE). Seventeen of the 

30 20 before and after doxorubicin pairs (indicated with suffixes BE and AF following 
the numerical identifier for each tumor) were matched together on terminal 
dendrogram branches (red branches), as were both of the tumor/lymph node 
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metastasis pairs (blue branches). The small black bars beneath the dendrogram 
identify the 17 pairs that were correctly matched by this hierarchical clustering, while 
, the larger green bars identify the positions of the three pairs that were not matched by 

the clustering. It is noted that the after-chemotherapy sample in each of these three 
5 sample pairs was clustered in a branch with normal breast tissue samples. Thus as for 
the 1753 gene set described in Examples 6 and 7, the intrinsic gene subset correctly 
identified independent tumor samples from the same tumor as related to each other. 
Despite the potential confounding effects of an interval of 16 weeks, independent 
surgical procedures and cytotoxic chemotherapy, the independent samples taken from 

1 0 the same tumor were in most cases recognizably more similar to each other in their 
overall pattern of gene expression than either was to any of the other samples. In 
addition, samples taken from a primary tumor and a metastasis from the same tumor 
could be recognized as closely related to one another. Thus in most cases independent 
samples from the same tumor were recognizable as such solely on the basis of gene 

15 expression patterns. This implies that the patterns of gene expression are 

homogeneous and stable in each breast tumor and yet sufficiently diverse between 
tumors so that they can be viewed as molecular portraits of each tumor. 

The 374 genes included in the "epithelial-enriched" subset are listed in Table 
8. Figure 2 presents a comparison of tumor dendrograms representing the results of 

20 hierarchical clustering of experimental samples using the "intrinsic" gene set and the 
dendrogram obtained by clustering using the "epithelial-enriched" gene set. The 
dendrograms are colored according to the clustering patterns obtained using the 
"intrinsic" gene set. Only two tumors (identified by the colored arrows) were placed 
in significantly different groups when the clustering was based on expression of the 

25 "epithelial-enriched" gene set instead of the "intrinsic" gene set. 

The overall architecture of the two dendrograms representing the clustering of 
breast tumor samples using these two alternative gene sets was very similar, with only 
two tumor pairs (i.e. Norway 14 and 26) materially changing position (Figure 2). 
Thus, the classifications derived from the "intrinsic" gene set are consistent with the 

30 results using the "epithelial-enriched" gene set, even though the two sets shared only 
25% of their genes. 
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A consistent division of the tumor samples into two subgroups was a striking 
feature of the dendrograms produced by both gene sets. Application of the "weighted 
voting" method of Golub recapitulated the sorting of the tissue samples between these 
two subgroups for all but one of the 65 samples, thus confirming the robustness of the 
5 division. . 

Example 9 

Identification of Breast Tumor Subgroups Based on Optimized Gene Sets 

10 Several groups of tumors that shared pervasive similarities in their expression 

patterns could be identified by cluster analysis; the dendrograms in Figure 2 are color- 
coded to highlight these subgroups. Characteristic features of the expression patterns, 
or the membership, of each highlighted group also suggested biological 
interpretations. These data confirm the ability of the clustering method to divide 

15 breast tumors into meaningful subgroups when applied Using the "intrinsic*' and 
"epithelial-enriched" gene subsets. Specific subgroups are discussed below and are 
named according to correlations between the genes expressed at high levels in the 
tumors and genes known to be expressed in particular cell types. 
Luminal Epithelial Cell Pattern : As described above, the major distinction was 

20 between a large group of tumors (identified by blue letters and dendrogram branches) 
and a second large group that included all of the other tumor subtypes and the normal 
breast samples (highlighted in other colors). The tumors in this "blue" group were 
characterized by relatively high levels of expression of many genes known to be 
expressed by the luminal epithelial cells of the normal mammary duct, notably 

25 including the estrogen and prolactin receptors. This connection was further 

corroborated using immunohistochemical analysis of breast tumor sections using 
antibodies against the luminal cell keratins 8/18, which stained the carcinoma cells in 
tumor specimens in this "blue" branch as shown, for example, in Figure 3f. With one 
exception, none of the tumors in this group expressed Erb-B2 at high levels. An 

30 estrogen receptor-positive phenotype is known to be associated with a relatively 
favorable prognosis 30,31 , while Erb-B2 expression is believed to contribute to 
tumorogenesis. 
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Normal Breast Tissue Pattern: Several tumors, including two "before and after" pairs 

and the single fibroadenoma tested (displayed in green), were clustered in a group of 
i samples that contained all three of the normal breast specimens. The "normal breast" 

gene expression pattern was typified by a relatively high level of expression of genes 
5 characteristic of basal epithelial cells and adipose cells, and relatively low levels of 

expression of genes characteristic of luminal epithelial cells. 

Basal Epithelial Cell Pattern : Many of the genes characteristic of basal epithelial cells 
were highly expressed in a group of six tumors (New York 2 and 3, Stanford 14 and 
23, and Norway 41 and 109, indicated in orange in the dendrogram, that were 

10 clustered based on pervasive similarities in their gene expression patterns. To 
corroborate the "basal cell-like" characteristics of these tumors, 
immunohistochemistry was performed using antibodies against keratins 5/6, 8/18, and 
17. All six of these tumors showed staining for either keratins 5/6 and/or 17 (basal cell 
keratins), and no staining for keratins 8/18 (See Figure 3e.) Notably, these six tumors 

1 5 also failed to express the estrogen receptor anid most of the other genes that were 
usually co-expressed with it. Approximately 90% of breast tumors are suggested to 
have characteristics of luminal epithelial cells, while the characteristics of the 
remaining 10% are less well defined 18 . Breast tumors that stain positive for basal cell 
keratins may account for 3-15% of all breast tumors 4M6 . 

20 The incidence among the tumor samples described herein was 1 5% (6/40). Many of 
the tumors that stained positive for basal cell keratins only showed staining in a 
fraction of the tumor cells, and neither basal nor luminal keratins could be detected in 
any of the other remaining tumor cells (Figure 3e). 

25 Erb-B2 Positive : As mentioned above, overexpression of the Erb-B2 oncogene was 
associated with a high level of expression of a specific set of genes, almost all of 
which map to the Erb-B2 region of chromosome 1 7 33 . A clustered group of tumors 
was identified that was partially characterized by the high level of expression of this 
subset of genes (Stanford 2 and Norway 47, 53, 57 and 101). These tumors showed 

30 low levels of expression of the estrogen receptor 48,49 and almost all of the other genes 
associated with estrogen receptor expression, a trait they share with the "basal-like" 
tumors, and which may contribute to the poor prognosis associated with these two 
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subtypes of breast tumors 41 - 43,49 ' 50 ; in.addition, both the basal-like and Erb-B2 positive 
tumors also show many p53 sequence mutations (see Table 5). 

Example 10 

5 Producing Antibodies to Basal Marker Polypeptides and Cytokeratin 1 7 

This example describes the generation of polyclonal antibodies that bind to 
cytokeratin 17 and the generation of polyclonal antibodies that bind to the 
polypeptides encoded by the three basal marker genes described herein, i.e., 
10 cadherin3, matrix metalloproteinase 14, and cadherin EGF LAG seven-pass G-type 
receptor 2 : The example further describes affinity purification of the antibodies. 
Materials 

• Anisole (Cat. No. A4405, Sigma) 

• 2,2^azino-di-(3-ethyl-ben2thiazoline-sulfonic acid) (ABTS) (Cat. No. A6499, 
15 Molecular Probes Eugene, OR) , 

• Activated Maleimide Keyhole Limpet Cyanin (Cat. No. 77106, Pierce Chemical 
Co. Rockford, IL) 

• Biotin (Cat. No. B2643, Sigma) 

• Boric acid (Cat. No. B0252, Sigma) 

20 • Sepharose 4b (Cat. No. 17-0120-01, LKB/Pharmacia, Uppsala, Sweden) 

• Bovine Serum Albumin (LP) (Cat. No. 1 00 350, Boehringer Mannheim, 
Indianapolis, IN) 

• Cyanogen bromide (Cat. No. C6388 Sigma, St. Louis, MO) 

• Dialysis tubing Spectra/Por Membrane MWCO: 6-8,000 (Cat. No. 1 32 665, 
25 Spectrum Industries Inc., Laguna Hills, CA) 

• Dimethyl formamide (DMF) (Cat. No. 22705-6, Aldrich Chemical Company, 
Milwaukee, WI) 

• DIC (Cat. No. BP 592-500, Fisher) 

• Ethanedithiol (Cat. No. 39,802-0, Aldrich Chemicals, Milwaukee, WI) 
30 • Ether (Cat. No. TX 1275-3, EM Sciences) 

• Ethylenediaminetetraacetatic acid (EDTA)(Cat No. BP 120-1, Fisher Scientific, 
Springfield, NJ) 
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• l-ethyl-3-(3'dimethylaminopropy])-carbodiimide, HCL (EDC) (Cat No. 341 -006, 
Calbiochem, San Diego, CA) 

( • Freund's Adjuvant, complete (Cat. No. M-0638-50B, Lee Laboratories, Grayson, 
GA) 

5 • Freund' s Adjuvant, incomplete (Cat. No. M0639-50B, Lee Laboratories) 

• Fritted chromatography columns (Column part No. 12131011; Frit: Part No. 
12131029, Varian Sample Preparation Products, Harbor City, CA) 

• Gelatin from Bovine Skin (Cat. No. G9382, Sigma) 

• Glycine (Cat. No. BP381-5, Fisher) 

10 • Goat anti-rabbit IgG, biotinylated (Cat No. A 0418, Sigma) 

• HOBt (Cat. No. 01-62-0008, Calbiochem-Novabiochem) 

• Horseradish peroxidase (HRP) (Cat. No. 814 393, Boehringer Mannheim) 

• HRP-Streptavidin (Cat. No. S 5512, Sigma) 

• Hydrochloric Acid (Cat No. 71445-500, Fisher) 

1 5 • Hydrogen Peroxide 30% w/w (Cat. No. ril 009, Sigma) 

• Methanol (Cat. No. A412-20, Fisher) 

• Microtiter plates, 96 well (Cat. No. 2595, Corning-Costar Pleasanton, CA) 

• N-n -Fmoc protected amino acids available from Calbiochem-Novabiochem, San 
Diego, CA. See 1997-1998 catalog pages 1-45. 

20 • N-O-Fmoc protected amino acids attached to Wang Resin available from 
Calbiochem-Novabiochem. See 1997-1998 catalog pages 161-164. 

• NMP (Cat. No. CAS 872-50-4, Burdick and Jackson, Muskegon, MI) 

• Peptide (Synthesized by Research Genetics, Inc. Details given below) 

• Piperidine (Cat. No. 80640, Fluka, available through Sigma) 
25 • Sodium Bicarbonate (Cat. No. BP328-1 , Fisher) 

• Sodium Borate (Cat. No. B9876, Sigma) 

• Sodium Carbonate (Cat. No. BP357-1 , Fisher) 

• Sodium Chloride (Cat. No. BP 358-10, Fisher) 

• Sodium Hydroxide (Cat. No. SS 255-1, Fisher) 

30 • Streptavidin (Cat. No. 1 520, Boehringer Mannheim) 

• Thioanisole (Cat. No. T-2765, Sigma) 

• Trifluoroacetic acid (Cat. No. TX 1275-3, EM Sciences) 
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Tween-20 (Cat. No. BP 337-500, Fisher) 

Wetbox-(Rubbermaid Rectangular Servin' Saver™ Part No. 3862 Wooster, OH) 



Solutions 

5 • BBS Borate Buffered Saline with EDTA dissolved in distilled water (pH 8.2 to 
8.4withHClorNaOH) 
-25 mM Sodium borate (Borax) 
-100 mM Boric Acid 
-75mMNaCl 
10 -5 mM EDTA 

• 0.1 NHC1 in saline 

-concentrated HC1 (8.3 mL/0.91 7 L distilled water) 
-0.154 MNaCl 

• Glycine (pH 2.0 and pH 3.0) dissolved in distilled water and adjusted to the 
15 desired pH. 

-0.1 M glycine 
-0.1 54 MNaCl 

• 5X Borate IX Sodium Chloride dissolved in distilled water. 

-0.11 MNaCl 
20 -60 mM Sodium Borate 

-250 mM Boric Acid 

• Substrate Buffer in distilled water adjusted to pH 4.0 with sodium hydroxide: 

-50 to 100 mM Citric Acid 



25 Peptide Synthesis Solutions 

• AA solution: HOBt is dissolved in NMP (8.8 grams HOBt to 1 liter NMP). 
Fmoc-N-a-amino at a concentration at .53 M. 

• DIC solution: 1 part DIC to 3 parts NMP. 

• . Deprotecting solution: 1 part Piperidine to 3 parts DMF 

30 • Reagent R: 2 parts anisole, 3 parts ethanedithiol, 5 parts thioanisole, 90 parts 
trifluoroacetic acid. 
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Equipment 

• MRX Plate Reader (Dynatech Inc., Chantilly, VA) 

• Hamilton Eclipse (Hamilton Instruments, Reno, NV) 

• Beckman TJ-6 Centrifuge, Refrigerated (Model No. TJ-6, Beckman Instruments, 
Fullerton, CA) 

• Chart Recorder (Recorder 1 Part No. 18-1001-40, Pharmacia LKB Biotechnology) 

• UV Monitor (Uvicord SII Part No. 1 8-1004-50, Pharmacia LKB Biotechnology) 

• Amicon Stirred Cell Concentrator (Model 8400, Amicon Inc., Beverly, MA) 

• 30 kD MW cut-off filter (Cat. No. YM-30 Membranes Cat. No. 13742, Amicon 
Inc., Beverly, MA) 

• Multi-channel Automated Pipettor (Cat. No. 4880, Corning Costar Inc., 
Cambridge, MA) 

• pH Meter Corning 240 (Corning Science Products, Corning Glassworks, Corning, 
NY) 

• ACT396 peptide synthesizer (Advanced ChemTech,' Louisville, KY) 

• Vacuum dryer (Box is from Labconco, Kansas City, MO; Pump is from Alcatel, 
Laurel MD). 

• Lyophilizer (Unitop 600sl in tandem with Freezemobile 12, both from Virtis, 
Gardiner, NY) 

Methods ( 

Peptides were selected using the program Omiga ™1.1 (Oxford Molecular 
Group, Inc., 2105 So. Bascom Ave., Suite 200, Campbell, CA 95008) using the 
Hopp/Woods method, which is described in Hopp TP, Woods KR, Mol Immunol, 
Apr;20(4):483-9 A computer program for predicting protein antigenic determinants, 
1983, and Hopp TP and Woods KR, Proc. Nat Acad Set U.S.A. 78, 3824-3828, 
1981. Preferred peptide sequences displayed minimal homology with known 
proteins. Three peptide sequences were selected for each polypeptide. The sequences 
were as follows: 

Peptides for antibodies that bind to cadherin3 (GenBank accession number 
NPJJ01784): 
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RAVFREAEVTLEAGGAEQE (SEQ ID NO:4) 
, QEPALFSTDNDDFTVRN (SEQ ID N0:5) 
QKYEAHVPENAVGHE (SEQ ID N0:6) 

5 

Peptides for antibodies that bind to matrix metalloproteinase 14 (GenBank accession 
number NP_004986): 

1 0 AYIREGHEKQADIMIFFAE (SEQ ID NO:7) 
DEASLEPGYPKHIKELGR (SEQ ID NO:8) 
RGSFMGSDEVFTYFYK (SEQ ID NO:9) 

1 5 Peptides for antibodies that bind to anti-cadherin EGF LAG seven-pass G-type 
receptor 2 (GenBank accession number NP_00 1 399): 

QASSLRLEPGRANDGDWH (SEQ ID NO: 1 0) 
ELKGFAERLQRNESGLDSGR (SEQ ID NO:l 1) 
20 RSGKSQPSYIPFLLREE (SEQ ID NO: 12) 

Peptides for antibodies that bind to anti-cytokeratinl 7: 

25 KKEPVTTRQVRHVEE (SEQ ID NO: 13) 
QDGKVISSREQVHQTTR (SEQ ID NO: 14) 
SSSKGSSGLGGGSS (SEQ ID NO: 15) 

Synthesis of Peptides 

30 Incubate: Resin was immersed in appropriate solution. All incubation steps occured 
with mixing. 

Wash: Added 2 mis. DMF, incubated 5 minutes and drained. 
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Wash Cycle: Five washes. 
Machine Synthesis 

The sequence of the desired peptide was provided to the peptide synthesizer. The C- 
5 terminal residue was determined and the appropriate Wang Resin was attached to the 
reaction vessel. The peptides were synthesized C-terminus to N-terminus by adding 
one amino acid at a time using a synthesis cycle. Which amino acid is added was 
controlled by the peptide synthesizer, which looks to sequence of the peptide entered 
into its database. 

10 

Step 1 - Resin Swelling: Added 2 mL DMF, incubated 30 minutes, drained DMF. 
Step 2 - Synthesis cycle 

2a - Deprotection: 1 mL deprotecting solution was added to the reaction 
vessel and incubated for 20 minutes. 
15 2b -Wash Cycle 

2c - Coupling: 750 mL of amino acid solution and 250 mL of DIC solution 
were added to the reaction vessel. The reaction vessel was incubated for 
thirty minutes and washed once. The coupling step was repeated once. 
2d - Wash Cycle 

20 Step 2 was repeated over the length of the peptide. The amino acid solution changed 
as the sequence listed in peptide synthesizer dictated. 
Step 3 - Final Deprotection: Steps 2a and 2b were performed one last time. 

Resins were deswelled in methanol — rinsed twice in 5 mL methanol, incubated 5 
25 minutes in 5 mL methanol, rinsed in 5 mL methanol — and then vacuum dried. 

Peptide was removed from the resin by incubating 2 hours in reagent R and then 
precipitated into ether. Peptide was washed in ether and then vacuum dried. Peptide 
was resolubilized in diH20, frozen, and lyophilized overnight. 

30 

Conjugation of Peptide with Keyhole Limpet Hemocyanin 

Peptide (6 mg) was dissolved in PBS (6 mL) and mixed with 6 mg of maleiimide 
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activated KLH carrier in 6 mL of PBS for a total volume of 12 mL. The entire 
solution was mixed for two hours, dialyzed in 1L PBS, and lyophilized. 

i 

Immunization of Rabbits 
5 Two New Zealand White Rabbits were injected with 250 jig keyhole limpet 
hemocyanin (KLH) conjugated peptide in an equal volume of complete Freund's 
adjuvant and saline in a total volume of 1 mL. Antigens (KLH-Peptide, 100 \ig each) 
in an equal volume of incomplete Freund's Adjuvant and saline were injected into 
three to four subcutaneous dorsal sites for a total volume of 1 mL two, four, and six 
10 weeks after the first immunization. The three peptides were injected together. 

The immunization schedule was as follows: 



DayO 


Pre-immune bleed, primary immunization 


Day 15 


1st Boost 


Day 27 


1st Bleed 


Day 44 


2nd Boost 


Day 57 


2nd Bleed and 3rd Boost 


Day 69 


3rd Bleed 


Day 84 


4th boost 


Day 98 


4th bleed 



15 The Collection of Rabbit Serum 

The rabbits were bled (30 to 50 mL) from the auricular artery. The blood was allowed 
to clot at room temperature for 15 minutes and the serum was separated from the clot 
using an DEC DPR-6000 centrifuge at 5000 x g. Cell-free serum was decanted gently 
into a clean test tube and stored at -20°C for affinity purification. 

20 

Determination of Antibody Titer 

All solutions with the exception of wash solution were added by the Hamilton 
Eclipse, a liquid handling dispenser. The antibody titer was determined in the rabbits 
using an ELISA assay with peptide on the solid phase. Flexible high binding ELISA 
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plates were passively coated with peptide diluted in BBS (100 ^L, 1 fig/well) and the 
plate was incubated at 4°C in a wetbox overnight (air-tight container with moistened 
cotton balls). The plates were emptied and then washed three times with BBS 
containing 0.1% Tween-20 (BBS-TW) by repeated filling and emptying using a semi- 
5 automated plate; washer. The plates were blocked by completely filling each well with 
BBS-TW containing 1% BSA and 0.1% gelatin (BBS-TW-BG) and incubating for 2 
hours at room temperature. The plates were emptied and sera of both pre- and post- 
immune serum were added to wells. The first well contained sera at 1 :50 in BBS. 
The sera were then serially titrated eleven more times across the plate at a ratio of 1 :1 
10 for a final (twelfth) dilution of 1 :204,800. The plates were incubated overnight at 
4°C. The plates were emptied and washed three times as described. 

Biotinylated goat anti-rabbit IgG (100 pL) was added to each microtiter plate test well 
and incubated for four hours at room temperature. The plates were emptied and 

15 washed three times. Horseradish peroxidase-conjugated Streptavidin (100 \iL diluted 
1 : 1 0,000 in BBS-TW-BG) was added to each well and incubated for two hours at 
room temperature. The plates were emptied and washed three times. The ABTS was 
prepared fresh from stock by combining 10 mL of citrate buffer (0.1 M at pH 4.0), 0.2 
mL of the stock solution (15 mg/mL in water) and 10 (xL of 30% H 2 0 2 . The ABTS 

20 solution (100fxL) was added to each well and incubated at room temperature. The 
plates were read at 414 X, 20 minutes following the addition of substrate. 

Preparation of the Peptide Affinity Purification Column: 

The affinity column was prepared by conjugating 5 mg of peptide to 10 mL of 

25 cyanogen bromide-activated Sepharose 4B, and 5 mg of peptide to hydrazine- 

Sepharose 4B. Briefly, 100 uL of DMF was added to peptide (5 mg) and the mixture 
was vortexed until the contents were completely wetted. Water was then added (900 
^L) and the contents were vortexed until the peptide dissolved. Half of the dissolved 
peptide (500 fiL) was added to separate tubes containing 10 mL of cyanogen-bromide 

30 activated sepharose 4B in 0.1 mL of borate buffered saline at pH 8.4 (BBS), and 10 
mL of hydrazine-Sepharose 4B in 0.1 M carbonate buffer adjusted to pH 4.5 using 
excess EDC in citrate buffer pH 6.0. The conjugation reactions were allowed to 
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proceed overnight at room temperature. Hie conjugated sepharose was pooled and 
loaded onto fritted columns, washed with 10 mL of BBS, blocked with 10 mL of 1 M 

, glycine, and washed with 10 mL 0.1 M glycine adjusted to pH 2.5 with HC1 and re- 
neutralized in BBS. The column was washed with enough volume for the optical 

5 density at 280A, to reach baseline. 

The Affinity Purification of Antibodies 

The peptide affinity column was attached to a UV monitor and chart recorder. 

10 The titered rabbit antiserum was thawed and pooled. The serum was diluted with one 
volume of BBS and allowed to flow through the columns at 1 0 mL per minute. The 
non-peptide immunoglobulins and other proteins were washed from the column with 
excess BBS until the optical density at 280 X reached baseline. The columns were 
disconnected and the affinity purified column was eluted using a stepwise pH gradient 

15 from pH 7.0 to pH 1 .0. The elution was monitored at 280 nM, and fractions 

containing antibody (pH 3.0 to pH 1 .0) were collected directly into excess 0.5 M 
BBS. Excess buffer (0.5 M BBS) in the collection tubes served to neutralize the 
antibodies collected in the acidic fractions of the pH gradient 

20 The entire procedure was repeated with "depleted" serum to ensure maximal recovery 
of antibodies. The eluted material was concentrated using a stirred cell apparatus and 
a membrane with a molecular weight cutoff of 30 kD. The concentration of the final 
preparation was determined using an optical density reading at 280 nM. The 
concentration was determined using the following formula: mg/mL = OD 2SO /1 A 

25 

Example 1 1 

SDS-PAGE and Immunoblot Analysis of Basal Marker Polypeptides 

To investigate the expression pattern of cadherin3, matrix metalloproteinase 
30 14, and cadherin EGF LAG seven-pass G-type receptor 2 , extracts were made from a 
variety of different cell lines and subjected to SDS-PAGE followed by 
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immunoblotting according to the protocol below, using affinity purified polyclonal 

antibody to BSTP-ECG1 prepared as described in Example 10. 

Materials 

• Acetic acid, Glacial (Cat. No. A38 c -212, Fisher) 



• Gelplate Clean (Cat. No. 786-140RF, Geno Technology, Inc., St. Louis) 

• Gelatin (Cat. No. G-2500, Sigma) 

• Glycerol (Cat. No. BP229-1, Fisher) 

• Glycine (Cat. No. G-8898, Sigma) 

20 • Hybond ECL (Cat. No. RPN303D, Amersham Pharmacia Biotech) 

• Lauryl Sulfate (SDS) (Cat. No. L-3771, Sigma) 

• Methanol (Cat No. BP1 105-4, Fisher) 

• M-Per (Cat. No. 78501, Pierce, Rockford, IL) 

• Nalgene bottle top filters (Cat. No. 09-740-62B, Fisher) 
25 • Nonfat dry milk (Kroger Co., Cincinnati, OH) 

• Ponceau-S (Cat. No. P-07170, Sigma) 

• Potassium phosphate (Cat. No. P-0662, Sigma) 

• 2X SDS gel loading buffer (Cat. No. 750006, Research Genetics, Huntsville, AL) 

• Size markers (Cat. No. M-3913, M-4038, M-3788, Sigma) 
30 • Sodium azide (Cat. No. S227I-25, Fish) 

• Sodium chloride (Cat. No. S271 -3, Fisher) 

• Sodium phosphate, Dibasic, Anhydrous (Cat. No. BP332-1, Fisher) 



5 



Acrylamide (Cat. No. A-3553, Sigma) 

Anti-Rabbit IgG (H&L) (Cat. No. 31460ZZ, Pierce) 

Bis-acrylamide (Cat. No. M-7279, Sigma) 

Blotting paper (Cat. No. 170-3960, Bio-Rad, Hercules, CA) 

Bovine Serum Albumin (LP) (Cat. No. 100-350, Boehringer Mannheim, 

Indianapolis, IN) 

Brilliant Blue R-250 (Cat. No. BP101-25, Fisher) 

Complete™ Mini (Cat. No. 1 8361 53, Boehringer Mannheim) 

ECL Western Blotting Detection Reagents (Cat. No. RPN2106, Amersham 

Pharmacia Biotech, Piscataway, NJ) 

Ethyl alcohol (AAPER Alcohol and Paper Chemical Co., Shelbyville, KY) 
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• t-amyl alcohol (Cat. No. A-l 6852, Sigma) 

• TEMED (Cat No. T-9281, Sigma) 

, • Trizma® Base (Cat. No. T-6066, Sigma) 

• Tween-20 (Cat No. BP337-500, Fisher) 

5 

Solutions 

• PBS - Phosphate Buffered Saline dissolved in distilled water 
-136 mM NaCl 

-2.7mMKCl 
10 -10.1mMNa 2 HPO 4 
-1.8mMKH 2 P0 4 

• Acrylamide/Bis (30% T, 2.67% C) dissolved in distilled water 
-4.1 Macrylamide 

-51.9 mMN,N'- 
15 • 1.5 MTris-HCl(pH 8.8) dissolved in distilled water 

• 0.5 M Tris-HCl (pH 6.8) dissolved in distilled water 

• 10% SDS - dissolve 10 grams SDS in 100 mis distilled water 

• Running Buffer 
-24.8 mM Tris base 

20 -191.9 mM glycine 
-3.5 mM SDS . 

• Towbin transfer buffer (pH 8.3) dissolved in distilled water 
-20% methanol 

-25 mMTris 
25 -192 mM glycine 

• Equilibrating buffer for gel drying, mixed in distilled water 
-20% ethanol 

-10% glycerol 

• Gel staining solution dissolved in distilled water 
30 -0.3 mM Coomassie brilliant blue R-250 

-40% methanol 

-7% glacial acetic acid 
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• Gel destaining solution mixed in distilled water 
-25% methanol 

, -7% glacial acetic acid 

• 10%Tween®20inPBS 

5 • 5% Nonfat dry milk in PBS 

• 0.2% BSA Blocking Buffer dissolved in PBS 
-0.2% BSA 

-0.1% gelatin 
-0.05%Tween®20 
10 • Wash Buffer 

-0.05%Tween®20 
-IX PBS 
Equipment 

• Microcentrifuge (Model 54 1 5, Eppendorf) 

15 • Power Pak 200 (Cat No. 165-5052, Bio-Rad) 

• Power Pak 3000 (Cat. No. 165-5056, Bio-Rad) 

• Protean II xi Cell (Cat. No. 165-1813, Bio-Rad) 

• Recirculating chiller (Cat. No. CFT33D1 15V, Neslab Instruments, Inc., 
Portsmouth, NH) 

20 • 20-Well comb (Cat. No. 165-1867, Bio-Rad) 

• pH Meter Corning 240 (Corning Science Products, Corning Glasswares, Corning, 
NY) 

• Air Cadet vacuum pump (Cat. No. P-07530-50, Cole-Palmer Instruments Co., 
Chicago, IL) 

25 • Tissue Tearor tissue homogenizer (Cat. No. 985370-07, BioSpec Products Inc., 
Bartletsville, OK) 

Methods 

Sample Preparation 

30 The following cell lines were used: 1 84B5, MCF7, OVCAR3, UACC62, 

HepG2, Colo205, UACC62, JURKAT, N-TERA2, MOLT4, Sw872. These cell lines 
are well known in the art. Descriptions of these cell lines are provided in Table 3, in 
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Perou, et ah, Molecular portraits of human breast tumours, Nature, 406(6797):747-52, 
2000, in Ross, D. T. et al. Systematic Variation in Gene Expression Patterns in 
Human Cancer Cell Lines. Nature Genetics, 24(3):227-35, 2000, and at the American 
Type Culture Collection Web site: http://www.atcc.org. Cell lines were maintained 
5 under standard growth conditions and in standard tissue culture media as appropriate 
for the particular cell line. Cells were collected according to standard techniques (e.g., 
trypsinization in the case of adherent cells), and the resulting cell suspension was 
prepared as follows: 

-The cell suspension was pelleted by centrifiigation at 3000 RPM for 10 minutes, and 
10 the supernatant was discarded. 

-The pellet was washed with 1ml PBS, centrifuged at 10000 RPM for 10 minutes, and 
the supernatant was discarded. 

-An appropriate volume of M-Per™ Reagent was added to the cell pellet and mixed 
• gently for 10 minutes in an ice bath. The mixture was centrifuged at 13200 RPM for 
15 15 minutes, and the supernatant was saved. 

The protein concentration in the supernatant was measured according to standard 
techniques. 

AH samples were mixed at 1 :1 with gel loading buffer and boiled for 5 minutes before 
loading. 

20 

SDS PAGE 

Standard SDS-PAGE stacking and running gels were prepared and placed in an 
electrophoresis apparatus. After filling the upper and lower chambers with running 
buffers the samples (60 Dg/lane) were loaded. The inner core was placed in the lower 
25 chamber and the lid placed on top. The apparatus was connected to the power supply 
and recirculating system. The temperature setting was!0°C. The stacking gel was run 
at 14mA per gel for 1 hour. The separating gel was run at 0.58mA per gel per hour for 
16 hours. 

30 Transfer to nitrocellulose 

After electrophoresis was complete, the gel was equilibrated in Towbin Buffer for 15- 
30 minutes. The assembly for transfer was as follows: 
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cathode 



pre-soaked blotting paper 
gel 



5 



pre-wetted nitrocellulose 
pre-soaked blotting paper 



anode 

The transfer was performed at 20V for 25 minutes, then 25 V for 20 minutes. After 
the transfer was complete, the gel was stained with Coomassie and the blot was 
stained with Ponceau-S. 



Western Blotting 

Primary and secondary antibodies 

All primary and secondary antibodies were diluted in 0.2% BSA blocking buffer. All 
incubation steps were done with gentle mixing. 

15 Blots were blocked in 5% milk overnight at room temperature. The blots were rinsed 
with wash buffer before adding the primary antibody and incubating for two hours at 
room temperature. The primary antibodies were used at titers of 1:200, 1:500, and 
1:1000 for anti-matrix metalloproteinase 14 and anti-cadherin EGF LAG seven-pass 
G-type receptor 2 and at 1 : 1 00 for anti-cadherin3 . 

20 One wash cycle was performed. One wash cycle consisted of: 



The secondary antibody was added and incubated for one hour at room temperature. 
One wash cycle was then performed. 

Peptide Block 

30 As a control to demonstrate the specificity of the antibody, in some experiments equal 
amounts (w/w) of peptide and antibody were added to 1/10 of the final volume of 
blocking buffer and incubated overnight at 4°C. The volume of blocking buffer was 



10 



25 



Wash 5 min, rinse 
Wash 5 min, rinse 
Wash 10 min, rinse 
Wash 5 min, rinse 
Wash 5 min, rinse 
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then brought up to the final volume, , and the membrane was incubated for an 
additional two hours at room temperature. 

Developing 

5 The blots were placed in a Ziploc® bag. Equal volumes of ECL western blotting 
detection reagents were mixed and distributed evenly over the blots. The blots were 
placed in an autoradiography cassette, covered with a piece of film, and exposed. 

Results 

10 Figure 4A shows a Western blot demonstrating expression of the cadherin3 

polypeptide in various cell lines. The lane order is, from left to right: MCF-7, 
Colo205, UACC62, JURKAT, HEPG2, N-TERA2, MOLT4, Sw872. The primary . 
antibody was used at a dilution of 1 : 100. 

Figure 4B shows a Western blot demonstrating expression of the matrix 
15 metalloproteinase 14 polypeptide in various cell lines. The lane order is, from left to 
right: 184B5, MCF7, OVCAR3, UACC62, HepG2. The three images present 
identical blots in which the primary antibody was used at dilutions of 1 :200 (left), 
1 :500 (middle), and 1 : 1000 (right). 

Figure 4C shows a Western blot demonstrating expression of the cadherin 
20 EGF LAG seven-pass G-type receptor 2 polypeptide in various cell lines. The lane 
order is, from left to right: 1 84B5, MCF7, OVCAR3, UACC62, HepG2. The three 
images present identical blots in which the primary antibody was used at dilutions of 
1:200 (left), 1:500 (middle), and 1:1000 (right). 

For all three antibodies, the Western blots demonstrated that the antibodies 
25 bind to a polypeptide of the expected size. All of the basal marker polypeptides are 
expressed in a range of different cell types. While not wishing to be bound by any 
theory, inventors postulate that basal cells in tissues other than breast may express the 
basal marker genes, which may make them useful for identification of basal tumor 
subclasses for tumors other than breast tumors. 

30 

Example 12 

Immunohistochemical Staining of Breast Tumor Arrays with Antibodies to 
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Cytokeratin 17 Demonstrates that Cytokeratin 17 Expression Correlates with Poor 

Outcome 

i 

Materials and Methods 
5 Tissue arrays. 

A total of 61 1 different paraffin embedded breast carcinoma samples were 
identified in the files in the Department of Pathology at the University of Basel, 
Women's hospital Rheinfelden, and the Kreiskrankenhaus Lorrach. The specimens 
were obtained from patients who underwent surgery in the period between 1985 and 

10 1994. The histologic parameters for all cases were reviewed by a single pathologist 
(JT) and the histologic type and grade was determined for each case according to 
Elston and Ellis Elston CW, Ellis 10: Pathological prognostic factors in breast cancer. 
L The value of histological grade in breast cancer: experience from a large study with 
long-term follow-up. Histopathology 1991, 19:403-10. 

15 Follow-up was obtained for 553 cases and ranged from 1 to 151 months with a 

mean of 65.9 months. The use of these specimens and data for research purposes was 
approved by the Ethics Committee of the Basel University Hospital. Tissue arrays 
were constructed by obtaining 0.6 mm diameter tissue cores from each tumor and 
placing these cores in a new paraffin block in rows and columns as described in 

20 Kononen J, Bubendorf L, Kallioniemi A, Barlund M, Schraml P, Leighton S, Torhorst 
J, Mihatsch MJ, Sauter G, Kallioniemi OP: Tissue microarrays for high-throughput 
molecular profiling of tumor specimens [see comments]. Nat Med '1998, 4:844-7 and 
in Schraml P, Kononen J, Bubendorf L, Moch H, Bissig H, Nocito A, Mihatsch MJ, 
Kallioniemi OP, Sauter G: Tissue microarrays for gene amplification surveys in many 

25 different tumor types. Clin Cancer Res 1999, 5:1966-75. 

Each of the 61 1 cases was sampled twice, once from the center of the tumor, 
and once from the periphery of the mass. Cores taken from the central area from each 
case were combined in one array and cores taken from the periphery of the tumor 
were combined in the other array. 

30 

Immunohistochemistry and scoring. 

Double staining of normal breast epithelium in conventional paraffin sections 
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was performed by first staining lumenal cells with CAM5.2 using alkaline 
phosphatase/fast blue staining and subsequently staining basal cells with CK17 using 
( horse radish peroxidase/DAB staining. 

Sections of arrays were stained with monoclonal antibodies specific for . 
5 cytokeratin 1 7 (DAKO, clone E3, dilution 1:10) and cytokeratin 5/6 (Boehringer 
Mannheim, dilution 1:10) after antigen retrieval by microwaving in citrate buffer. 
Note that the anti-cytokeratin 5/6 antibody used herein detects both cytokeratins 5 and 
6. However, cytokeratin 5 is likely to be the major antigen recognized by this 
antibody in breast basal cells. Staining results were scored as follows: 1 = invasive 

10 tumor cells present in tissue core and no staining seen; 2 = invasive tumor cells 

present and weak staining; 3 = invasive tumor cells present with strong staining. Only 
those cores containing tissue consistent with a diagnosis of invasive carcinoma were 
included in the outcome analysis. Cases that either had no tissue present on the array 
sections or cases in which the material sampled consisted of fat, fibrosis, normal 

1 5 breast glands, or in-situ carcinoma only, were' omitted from further analysis. 

Cytokeratins often showed only focal staining of tumor cells within the tissue array 
cores or conventional paraffin sections. To account for the focal expression of CK17 
and CK5/6, each of the 612 breast tumors was analyzed 4 times: with anti-CK17 and 
anti-CK5/6 antibody on the "central sample" array, and with anti-CK17 and anti- 

20 CK5/6 antibody on the "peripheral sample" array. A breast tumor sample was scored 
as staining positive for the keratins if infiltrating carcinoma in one or more of the 
cores from that sample reacted with either of the antibodies. 

To aid in recognizing infiltrating carcinoma in the core samples, sections of 
each array were also stained with an anti-cytokeratin antibody mix reacting with 

25 cytokeratins 8 and 1 8 (CAM5.2, Becton & Dickinson, dilution 1 :20) after antigen 
unmasking by trypsin digestion to highlight invasive carcinoma cells. 

Statistical analysis 

Univariate survival analysis based upon gene expression defined subgroups of 
30 patients was performed by Kaplan-Meier statistics using WinSTAT software 

(www.winstat.com). Subsequent multivariate analyses were performed using Cox's 
proportional hazards model for survival data (Cox: Regression models and life tables. 
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Journal Royal Statistical Society 1972, 74:187-220). 



Results 



Basal keratin staining in normal breast and breast carcinoma, 
5 In normal breast, antibodies that bind to cytokeratinl7 (CK17) and cytokeratin 

5/6 (CK5/6) stain the basal layer of breast glandular epithelium while antibodies that 
bind to cytokeratins 8 and 18 stain lumenal cells (figures 3C and 3D). Whole paraffin 
sections of breast carcinoma showed that cytokeratin 17 and 5/6 expression in paraffin 
embedded tissue when present was focal (Figures '3E and 3F) with often less than 10% 

10 of tumor cells reacting. In an attempt to study further the focal reactivity of the 

monoclonal antibodies against the basal type cytokeratins, and to attempt to improve 
the reliability of this test, rabbit antisera against CK17 were raised as described in 
Example 12. This serum was tested on a separate tissue array with over 300 hundred 
breast samples. The antiserum and the monoclonal antibody against CK17 showed 

15 highly similar reactivity with epithelial cells in the breast cores. Both reagents stained 
the same fraction of tumor cells suggesting that neither is a significantly better 
reagent. These results suggest that the focal reactivity seen with monoclonal anti- 
CK17 was not due to weak reactivity of the monoclonal antibody but indicates that 
within a tumor only a subset of tumor cells express these basal keratins, reinforcing 

20 the need for alternative basal markers. 

Basal keratin staining on breast carcinoma tissue arrays. 

Since the size of sample examined in tissue array cores is significantly smaller 
than on conventional samples, there was a concern that the focal reactivity of basal 

25 type cytokeratins might cause positive tumors to be missed. We decided to maximize 
the chance of detecting basal keratin expression in the breast tumors on the arrays by 
staining them with monoclonal antibodies directed at CK5/6 and CK17 and by 
examining arrays made with cores taken from central and peripheral areas of the 
tumors. By combining the results from the "central" array and the "peripheral" array, 

30 532 tumors were available for CK17 analysis, 535 were available for CK5/6 analysis, 
and 564 were available for either CK17 or CK5/6. The remainder of the tumors 
represented on the arrays were either lost in transfer during sectioning of the tissue 
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arrays block, or showed no convincing invasive carcinoma on the core section. Of the 
cases available for scoring, 75 and 63 tumors scored positive (either weak or strongly) 

, for CK17 and CK5/6, respectively. By combining the results from the stains for CK1 7 
and CK5/6, 90 cases (16%) out of the 564 tumors examined reacted with either CK17 

5 and/or CK5/6. Follow-up data were available for 505 of the 564 cases on which CK 
staining data was obtained. The follow-up period ranged from 1 to 151 months with a 
mean of 66.1 months. 

Kaplan-Meier survival analysis on all patients with follow-up showed that the 
absence of cytokeratin 17 and cytokeratin 5 is associated with a significantly better * 1 
1 0 prognosis than the presence of either of these cytokeratins (figure 5 A, p=0.0 12). In 
the group of 229 patients with known lymph node metastases, the expression of CK17 
and CK5/6 had no predictive value. In contrast, in the group of 245 patients without 
lymph node metastases, CK17 and/or CK5/6 expression was significantly associated 
with shorter survival (figure 5B, p=0.006). The percentage of basal keratin positive 
1 5 tumors was similar in patients with and without lymph node metastases. Multivariate 
analysis on all patients taken together showed that the prognostic association of basal 
cytokeratin expression with poor outcome was not independent from tumor size, LN 
status and histologic grade. However when analyzed on LN-negative tumors alone, 
the expression of basal cytokeratins is not only a statistically significant 
20 prognosticator, but is also independent of tumor size, tumor grade, her2neu status, ER 
status, and GATA3 status. The results clearly demonstrate the utility of cytokeratin 17 
as a marker for a subclass of tumors with a poor clinical outcome while also 
highlighting the difficulties associated with use of anti-cytokeratinl7 antibodies. 

25 Her2neu, estrogen receptor and GATA-3 staining on breast carcinoma arrays 
To further confirm the accuracy of correlations between 
immunohistochemistry results and clinical data obtained using tissue arrays, sections 
of the arrays made with peripheral cores were stained for a variety of other proteins 
known or suspected to be associated with a good or a poor clinical outcome, for 

30 example estrogen receptor and Her2neu. As expected, expression of estrogen 

receptors was associated with a better clinical outcome. This finding was independent 
of BRE grade, LN status and size. In contrast, Her2neu expression was associated 
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with a poor prognosis. These results are compatible with published data and are 
similar to those of two additional studies performed on the same breast tumor arrays. 
(Bucher C, Torhorst J, Kononen J, Haas P, Schraml L, Bubendorf L, Zuber M, Kochli 
OR, Mross F, Dieterich H, Askaa J, Godtfredsen SE, Seelig S, Moch H, Mihatsch M, 
5 Kallioniemi O, Sauter G: Prognostic significance of HER-2 amplification and 

overexpression in breast cancer: Methodological comparison of fluorescence in situ 
hybridization and immunohistochemistry using tissue microarrays of 61 1 primary 
breast cancers, in press, 2001; Torhorst J, Bucher C, Kononen J, Haas P, Zuber M, 
Kochli OR, Mross F, Dieterich H, Moch H, Mihatsch M, Kallioniemi O, Sauter G: 
10 Tissue microarrays for rapid linking of molecular changes to clinical endpoints. in 
press. 2001 ) 

Sections of the arrays were also stained for GATA-binding protein 3, an 
antigen thought to be co-expressed with estrogen receptors on the mRNA.and protein 
level (Hoch RV, Thompson DA, Baker RJ, Weigel RJ: GATA-3 is expressed in 

1 5 association with estrogen receptor in breast cancer. International Journal of Cancer 
1999, 84:122-8). The expression for GATA-3 was associated with a good clinical 
outcome and had a high correlation (Chi-square=720.3 on 9 degrees of freedom) with 
estrogen receptor expression. The staining results for estrogen receptor, GATA-3 and 
her2neu confirm findings from prior studies, and also function as an independent 

20 validation of tissue array-based studies. 

Tissue arrays present a number of advantages for tumor analysis. Analysis of 
large numbers of tissue sections using conventional techniques is laborious and 
expensive. An added disadvantage is that slides are stained in different batches, which 
can introduce variation in staining intensity. In addition, the analysis of large number 

25 of conventional glass slides makes comparisons between tumor samples difficult. 
Many of these problems are circumvented by the new technique of tissue arrays. This 
approach allows the efficient analysis of antibody reactivity on large numbers of 
tumors that are stained together on the same slide. 

The tissue array studies reported here allowed separation of the patients groups 

30 into patients with lymph node metastasis and those without. In patients with 

metastatic disease to the lymph nodes, the expression of the basal cytokeratins was not 
associated with a significant difference in clinical outcome. However, in lymph node 
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negative patients the reactivity for these markers was associated with a poor prognosis 

independent of tumor size, tumor grade, or immunostain reactivity for ER, her2neu or 
, GATA3. While not wishing to be bound by any theory, taken together with the gene 

array data, these findings support the idea that anti-cytokeratin antibodies may 
5 identify a different type of tumor rather than just another prognostic marker and 

suggest the possibility that these tumors are derived from basal cells and not from 

lumenal cells. 

Due to the focal and often weak reactivity of monoclonal antibodies against 
basal type keratins, the interpretation of staining results for these markers can be 

1 0 difficult. The intensity of staining with these markers is not comparable with other 
markers currently used in diagnosis of breast carcinoma, such as estrogen receptor and 
her2neu, a feature that prevents their use in clinical settings. We attempted to 
generate new reagents in the hope that they would have more robust IHC staining 
characteristics. Analysis of over 300 breast carcinoma samples in a separate array 

1 5 showed that the number of cells and the pattern of focal reactivity for the antiserum 
against CK17 and the intensity of staining were similar to that seen with the 
monoclonal antibodies. This indicates that the basal keratins are indeed only focally 
expressed and that the low numbers of cells stained with antibodies is not due to a 
weak reactivity of the monoclonal antibodies with the protein. 

20 The studies presented here show that basal epithelial cytokeratin positive 

tumors occur with a significant frequency (>1 0%) and are associated with a poor 
prognosis. Patients with metastatic breast carcinoma to the axillary lymph nodes are at 
high risk for recurrence and most receive adjuvant therapy. The situation for node 
negative patients is less clear; depending on the size and grade of the tumor, the 

25 reported recurrence rate varies between 5-30%. In lymph node negative patients, the 
clinical decision whether to give or withhold systemic therapy thus is a difficult one 
and hence it is in this group of patients that the need for new prognostic markers is the 
greatest. The relative size of this group of patients is also expected to increase, due to 
continuing advances in screening and diagnostic techniques that identify increasingly 

30 smaller breast tumors. Most of these smaller tumors have not metastasized to the 
"sentinel" lymph node. This group of patients, therefore, has to make a difficult 
choice between a variety of additional therapies, such as: lumpectomy, mastectomy, 
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chemotherapy, radiation therapy, or hormonal therapy in the absence of reliable 
guidance from pathologic characteristics of their tumor. The cytokeratins 17 and 5/6 
appear to detect a subcategory of tumors that behave poorly and may help in treatment 
decisions for node-negative breast carcinoma patients. These results suggest that 
5 patients that present with basal epithelial cytokeratin expressing tumors may be 
candidates for more aggressive treatment procedures and also for alternate therapies 
directed against tumors with this particular biology. 

Example 13 

10 

Immunohistochemical Staining of Normal Breast and Breast Tumor Samples in 
Tissue Arrays with Antibodies to Basal Marker Polypeptides 
Materials and Methods 

Tissue arrays including normal breast and breast tumor samples were prepared 
15 as described in Example 12. Monoclonal antibody to cytokeratin 5/6 (Boeringer 
Mannheim, Inc.) and polyclonal, affinity purified, anti-peptide antibodies to 
cadherin3, cadherin EGF LAG seven-pass G-type receptor 2, and matrix 
metalloproteinase 14 prepared as described in Example 10 were used to perform 
immunohistochemical staining using the DAKO Envision*, Peroxidase HC kit 
20 (DAKO Corp., Carpenteria, CA) with DAB substrate according to the manufacturer's 
instructions. 

Results 

Figure 6 shows antibody staining of normal breast tissue cores. Figure 6A 
25 shows staining with anti-cytokeratin 5/6 monoclonal antibody (ck5/6). Figures 6B , 
6C, and 6D show staining with anti-cadherin 3 polyclonal antibody (s0158), anti-EGF 
LAG seven-pass G-type receptor 2 polyclonal antibody (s0137), and anti- 
metalloproteinase 14 polyclonal antibody (s0144), respectively, on sections from a 
core derived from the same patient. The brown areas represent prominent staining of 
30 the basal layer in the two-cell layered epithelium lining the mammary gland lumen. 
These results confirm that the staining pattern of antibodies to the basal marker 
polypeptides identified herein is comparable to that of antibodies to cytokeratin 17 in 
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terms of the cell type stained and the ability to distinguish between basal and luminal 

cells in the normal mammary gland. 
, Figure 7 shows antibody staining of breast cancer tissue cores. Figure 7 A 

shows antibody staining with anti-cytokeratin 5/6 monoclonal antibody (cd5/6). 
5 Figures 7B and 7C show staining with anti-EGF LAG seven-pass G-type receptor 2 

polyclonal antibody (s0137) and anti-cadherin 3 polyclonal antibody (s0158), 

respectively. The brown areas represent prominent staining of the epithelial cells 

within tumor tissue. Note the loss of normal breast glandular 

architecture consistent with the diagnosis of carcinoma. 
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1 CLAIMS 
2 

3 We claim: 
4 

5 LA method of classifying a tumor comprising the steps of: 

6 providing a tumor sample; 

7 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

8 NO:l in the sample; and 

9 classifying the tumor as belonging to a tumor subclass based on the results of 
10 the detecting step. 

11 

12 2. A method of classifying a tumor comprising the steps of: 

1 3 providing a tumor sample; 

14 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

15 NO:2 in the sample; and 

1 6 classifying the tumor as belonging to a tumor subclass based on the results of 

17 the detecting step. 
18 

19 3 . A method of classifying a tumor comprising the steps of: 

20 providing a tumor sample; 

21 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

22 NO:3 in the sample; and 

23 classifying the tumor as belonging to a tumor subclass based on the results of 

24 the detecting step. 
25 

26 4. A method of classifying a tumor comprising the steps of: 

27 providing a tumor sample; 

28 detecting expression or activity of at least two genes selected from the group 

29 consisting of: a gene encoding the polypeptide of SEQ ID NO: 1 , SEQ ID NO:2, and 

30 SEQ ID NO:3 in the sample; and 

3 1 classifying the tumor as belonging to a tumor subclass based on the results of 

32 the detecting step. 
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1 

2 5. The method of any of claims 1, 2, 3, or 4, wherein the detecting step comprises 

3 detecting the polypeptide or polypeptides. 
4 

5 6. The method of claim 5, wherein the polypeptide is detected by performing 

6 immunohistochemical analysis on the sample using an antibody that specifically binds 

7 to the polypeptide. 
8 

9 6a. The method of claim 5, wherein the polypeptide is detected Byperforming an 

1 0 ELISA assay using an antibody that specifically binds to the polypeptide. 
11 

12 6b. The method of claim 5, wherein the polypeptide is detected using an antibody 

13 . array comprising an antibody that specifically binds to the polypeptide. 
14 

15 6c, The method of claim 5, wherein the detecting step comprises: 

16 detecting modification of a substrate by the polypeptide. 
17 

18 7. The method of any of claims 1, 2, 3, or 4, wherein classifying a tumor comprises: 

1 9 stratifying a subject having the tumor for a clinical trial. 
20 

21 8. The method of claim 7, wherein the tumor is a breast tumor. 
22 

23 9. The method of any of claims 1, 2, 3, or 4, wherein the tumor is a breast tumor and 

24 the tumor subclass is a basal tumor subclass. 
25 

26 la. The method of claim 1, further comprising: 

27 providing diagnostic, prognostic, or predictive information based on the 

28 classifying step. 
29 

30 2a. The method of claim 2, further comprising: 

3 1 providing diagnostic, prognostic, or predictive information based on the 

32 classifying step. 
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1 

2 3a. The method of claim 3, further comprising: 

( 3 providing diagnostic, prognostic, or predictive information based on the 

4 classifying step. 
5 

6 4a. The method of claim 4, further comprising: 

7 providing diagnostic, prognostic, or predictive information based on the 

5 classifying step. 

~. 9 

10 5a. The method of claim 5, further comprising: 

1 1 providing diagnostic, prognostic, or predictive information based on the 



12 classifying step. 
13 

14 6aa. The method of claim 5a, wherein the polypeptide is detected by performing 

15 immunohistochemical analysis on the sample using an antibody that specifically binds 

16 to the polypeptide. 
17 

1 8 6ab. The method of claim 5a, wherein the polypeptide is detected by performing an 

1 9 ELISA assay using an antibody that specifically binds to the polypeptide. 
20 

21 6ac. The method of claim 5a, wherein the polypeptide is detected using an antibody 

22 array comprising an antibody that specifically binds to the polypeptide. 
23 

24 6ad. The method of claim 5a, wherein the detecting step comprises: 

25 detecting modification of a substrate by the polypeptide. 
26 

27 9a. The method of any of claims la, 2a, 3a, or 4a, wherein the tumor is a breast tumor 

28 and the tumor subclass is a basal tumor subclass. 
29 

30 lg. The method of claim 1, further comprising: 

3 1 selecting a treatment based on the classifying step. 
32 
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1 2g. The method of claim 2, further comprising: 

2 selecting a treatment based on the classifying step. 
3 

4 3g. The method of claim 3, further comprising: 

5 selecting a treatment based on the classifying step. 
6 

7 4g. The method of claim 4, further comprising: 

8 selecting a treatment based on the classifying step. 
9 

10 5g. The method of claim 5, further comprising: 

1 1 selecting a treatment based on the classifying step. 



12 

13 6ag. The method of claim 5g, wherein the polypeptide is detected by performing 

14 immunohistochemical analysis on the sample using an antibody that specifically binds 

15 to the polypeptide. 
16 

17 6bg. The method of claim 5g, wherein the polypeptide is detected by performing an 

1 8 ELISA assay using an antibody that specifically binds to the polypeptide. 
19 

20 6cg. The method of claim 5g, wherein the polypeptide is detected using an antibody 

21 array comprising an antibody that specifically binds to the polypeptide. 
22 

23 6dg. The method of claim 5g, wherein the detecting step comprises: 

24 detecting modification of a substrate by the polypeptide. 
25 

26 9g. The method of any of claims lg, 2g, 3g, or 4g, wherein the tumor is a breast tumor 

27 and the tumor subclass is a basal tumor subclass. 
28 

29 1m. A method of testing a subject comprising the steps of: 

30 providing a sample isolated from a subject; 

3 1 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

32 NO:l in the sample; and 
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1 providing diagnostic, prognostic, or predictive information based on the 

2 detecting step. 
3 

4 2m. A method of testing a subject comprising the steps of: 

5 providing a sample isolated from a subject; 

6 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

7 NO:2 in the sample; and 

8 providing diagnostic, prognostic, or predictive information based on the 

9 detecting step. 
10 

1 1 3m. A method of testing a subject comprising the steps of: 

12 providing a sample isolated from a subject; 

1 3 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

14 NO:3 in the sample; and 

15 providing diagnostic, prognostic, or predictive information based on the 

16 detecting step. 
17 

18 4m. A method of testing a subject comprising the steps of: 

1 9 providing a sample isolated from the subject; 

20 detecting expression or activity of at least two genes selected from the group 

21 consisting of: a gene encoding the polypeptide of SEQ ID NO:I, SEQ ID NO:2, and 

22 SEQ ID NO:3 in the sample; and 

23 providing diagnostic, prognostic, or predictive information based on the 



24 detecting step. 
25 

26 5m. The method of any of claims 1m, 2m, 3m, or 4m, wherein the detecting step 

27 comprises detecting the polypeptide or polypeptides. 
28 

29 6m. The method of claim 5m, wherein the polypeptide is detected by performing 

30 immunohistochemical analysis on the sample using an antibody that specifically binds 

31 to the polypeptide. 
32 
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1 6ma. The method of claim 5m, wherein the polypeptide is detected by performing an 

2 ELISA assay using an antibody that specifically binds to the polypeptide. 
3 

4 6mb. The method of claim 5m 5 wherein the polypeptide is detected using an antibody 

5 array comprising an antibody that specifically binds to the polypeptide. 
6 



7 6mc. The method of claim 5m, wherein the detecting step comprises: 

8 detecting modification of a substrate by the polypeptide. 
9 

10 9m. The method of any of claims lm, 2m, 3m, or 4m, wherein the sample is selected 

1 1 from the group consisting of: 

12 a blood sample, a urine sample, a serum sample, an ascites sample, a saliva 



1 3 sample, a cell, and a portion of tissue. 
14 

15 10m. The method of any of claims lm, 2m, 3m, or 4m, wherein the sample is a tumor 

16 sample. 

17 . 

18 1 lm. The method of claim 10m, wherein the tumor sample is a breast tumor sample. 
19 



20 lr. A method of testing a subject comprising the steps of: 

21 providing a sample isolated from a subject; 

22 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

23 NO:l in the sample; and 

24 stratifying the subject for a clinical trial based on the detecting step. 
25 

26 2r. A method of testing a subject comprising the steps of: 

27 providing a sample isolated from a subject; 

28 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

29 NO:2 in the sample; and 

30 stratifying the subject for a clinical trial based on the detecting step. 
31 

32 3r. A method of testing a subject comprising the steps of: 
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1 providing a sample isolated from a subject; 

2 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

3 NO:3 in the sample; and 

4 stratifying the subject for a clinical trial based on the detecting step. 
5 

6 4r. A method of testing a subject comprising the steps of: 

7 providing a sample isolated from the subject; 

8 detecting expression or activity of at least two genes selected from the group 

9 consisting of: a gene encoding the polypeptide of SEQ ID NO:l , SEQ ID NO:2, and 

10 SEQ ID NO:3 in the sample; and 

1 1 stratifying the subject for a clinical trial based on the detecting step. 



12 

13 5r. The method of any of claims Ir, 2r, 3r, or 4r, wherein the detecting step comprises 

14 detecting the polypeptide or polypeptides. 
15 

16 6r. The method of claim 5r, wherein the polypeptide is detected by performing 

1 7 immunohistochemical analysis on the sample using an antibody that specifically binds 

18 to the polypeptide. 
19 

20 6ra. The method of claim 5r, wherein the polypeptide is detected by performing an 

21 ELISA assay using an antibody that specifically binds to the polypeptide. 
22 



23 6rb. The method of claim 5r, wherein the polypeptide is detected using an antibody 

24 array comprising an antibody that specifically binds to the polypeptide. 
25 

26 6rc. The method of claim 5r, wherein the detecting step comprises: 

27 detecting modification of a substrate by the polypeptide. 
28 

29 9r. The method of any of claims lr, 2r, 3r, or 4r, wherein the sample is selected from 

30 the group consisting of: 

3 1 a blood sample, a urine sample, a serum sample, an ascites sample, a saliva 

32 sample, a cell, and a portion of tissue. 
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1 

2 lOr. The method of any of claims lr, 2r, 3r, or 4r, wherein the sample is a tumor 

( 3 sample. 

4 

5 ] lr. The method of claim 10r, wherein the tumor sample is a breast tumor sample. 
6 



7 lq. A method of testing a subject comprising the steps of: 

8 providing a sample isolated from a subject; 

9 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

1 0 NO: 1 in the sample; and 

1 1 selecting a treatment based on the detecting step. 
12 

13 2q. A method of testing a subject comprising the steps of: 

14 providing a sample isolated from a subject; 

1 5 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

16 NO:2 in the sample; and 

1 7 selecting a treatment based on the detecting step. 
18 

19 3q. A method of testing a subject comprising the steps of: 

20 providing a sample isolated from a subject; 

2 1 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

22 NO:3 in the sample; and 

23 selecting a treatment based on the detecting step. 
24 

25 4q. A method of testing a subject comprising the steps of: 

26 providing a sample isolated from the subject; 

27 detecting expression or activity of at least two genes selected from the group 

28 consisting of: a gene encoding the polypeptide of SEQ ID NO: 1 , SEQ ID NO:2 9 and 

29 SEQ ID NO:3 in the sample; and 

30 selecting a treatment based on the detecting step. 
31 
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1 5q. The method of any of claims lq„2q, 3q, or 4q, wherein the detecting step 

2 comprises detecting the polypeptide or polypeptides. 
3 

4 6q. The method of claim 5q, wherein the polypeptide is detected by performing 

5 immunohistochemical analysis on the sample using an antibody that specifically binds 

6 to the polypeptide. 
7 

8 6qa. The method of claim 5q, wherein the polypeptide is detected by performing an 

9 ELISA assay using an antibody that specifically binds to the polypeptide. m "" m 
10 

1 1 6qb. The method of claim 5q, wherein the polypeptide is detected using an antibody 

12 array comprising an antibody that specifically binds to the polypeptide, 
13 

14 6qc. The method of claim 5q, wherein the detecting, step comprises: 

1 5 detecting modification of a substrate by the polypeptide. 
16 

1 7 9q. The method of any of claims lq, 2q, 3q, or 4q, wherein the sample is selected 

1 8 from the group consisting of: 

19 a blood sample, a urine sample, a serum sample, an ascites sample, a saliva 

20 sample, a cell, and a portion of tissue. 
21 

22 10m. The method of any of claims lm, 2m, 3m, or 4m, wherein the sample is a tumor 

23 sample. 
24 

25 11m. The method of claim 10m, wherein the tumor sample is a breast tumor sample. 
26 

27 20, An antibody that specifically binds to an epitope found in a polypeptide whose 

28 amino acid sequence the amino acid sequence of SEQ ID NO:l, and wherein the 

29 antibody recognizes basal cells in normal mammary lactation glands. 
30 

31 21 . The antibody of claim 21, wherein the antibody distinguishes basal cells from 

32 luminal cells in normal mammary lactation glands. 

122 



WO 02/08765 



PCT/US01/23843 



1 

2 22. The antibody of claim 20, wherein the antibody is a monoclonal antibody. 

3 

4 23. The antibody of claim 20, wherein the antibody is a polyclonal antibody. 
5 

6 24. The antibody of claim 20, wherein the antibody recognizes an epitope found in a 

7 peptide having an amino acid sequence selected from the group consisting of SEQ ID 

8 NO:4, SEQ ID NO:5, and SEQ ID NO:6. 
9 

10 25. An antibody that specifically binds to an epitope found in a polypeptide whose 

1 1 amino acid sequence comprises the amino acid sequence of SEQ ID NO:2, and 

12 wherein the antibody recognizes basal cells in normal mammary lactation glands. 
13 

14 26. The antibody of claim 25, wherein the antibody distinguishes basal cells from 

1 5 luminal cells in normal mammary lactation glands. 
16 

17 27. The antibody of claim 25, wherein the antibody is a monoclonal antibody. 
18 

19 28. The antibody of claim 25, wherein the antibody is a polyclonal antibody. 
20 

21 29. The antibody of claim 25, wherein the antibody recognizes an epitope found in a 

22 peptide having an amino acid sequence selected from the group consisting of SEQ ID 

23 NO:7, SEQ ID NO:8, and SEQ ID NO:9. 
24 

25 30. An antibody that specifically binds to an epitope found in a polypeptide whose 

26 amino acid sequence comprises the amino acid sequence of SEQ ID NO:3, and 

27 wherein the antibody recognizes basal cells in normal mammary lactation glands. 
28 

29 31. The antibody of claim 30, wherein the antibody distinguishes basal cells from 

30 luminal cells in normal mammary lactation glands. 
31 

32 32. The antibody of claim 30, wherein the antibody is a monoclonal antibody. 
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1 

2 33. The antibody of claim 30, wherein the antibody is a polyclonal antibody. 
3 

4 34. The antibody of claim 30, wherein the antibody recognizes an epitope found in a 

5 peptide having an amino acid sequence selected from the group consisting of SEQ ID 

6 NO:10,SEQE>NO:ll,andSEQIDNO:12. 
7 

8 38. A kit for tumor diagnosis comprising: 



9 one or more of the antibodies of any of claims 20 through 34; 

10 instructions for use of the kit; and 

11 a control slide comprising breast tissue samples for testing reagents in the kit. 
12 

13 40. A method of testing a compound or a combination of compounds for activity 

14 against tumors comprising steps of: 

1 5 obtaining or providing tumor samples taken from subjects who have been 

1 6 treated with the compound or combination of compounds, wherein the tumors fall 

1 7 within a tumor subclass; 

1 8 comparing the response rate of tumors that fall within the tumor subclass and 

19 have been treated with the compound with the overall response rate of tumors that 

20 have been treated with the compound or combination of compounds or with the 

21 response rate of tumors that do not fall within the subclass and have been treated with 

22 the compound or combination of compounds; and 

23 identifying the compound or combination of compounds as having selective 



24 activity against tumors in the tumor subclass if the response rate of tumors in the 

25 subclass is greater than the overall response rate or the response rate of tumors that do 

26 not fall within the subclass. 
27 

28 41 . The method of claim 40, wherein the tumors are breast tumors. 
29 

30 42. The method of claim 41, wherein the tumor subclass is a basal tumor subclass. 
31 
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1 43. The method of claim 41, wherein the tumors are classified according to the 

2 method of any of claims 1, 2, 3, or 4. 

? 

4 44. The method of claim 41, wherein the tumor subclass is a basal tumor subclass and 

5 wherein a tumor is identified as belonging to the tumor subclass based on evidence of 

6 expression of one or more basal marker genes in the sample. 
7 

8 45. The method of claim 44, wherein evidence of expression comprises presence of a 

9 protein encoded by a basal marker gene, and wherein the evidence of expression is 
1 0 obtained using an antibody that binds to the protein. 

11 

12 46. The method of claim 45, wherein the basal marker gene encodes a polypeptide 

1 3 comprising the amino acid sequence of SEQ ID NO: 1 . 
14 

15 47. The method of claim 45, wherein the basal marker gene encodes a polypeptide 

1 6 comprising the amino acid sequence of SEQ ID NO:2. 
17 

18 48. The method of claim 45, wherein the basal marker gene encodes a polypeptide 

1 9 comprising the amino acid sequence of SEQ ID NO:3 . 
20 

21 49. The method of claim 40, wherein the samples are present within a tissue array. 
22 

23 60. A method of testing a compound or a combination of compounds for activity 

24 against tumors comprising steps of: 

25 treating subjects in need of treatment for tumors with the compound or 

26 combination of compounds; 

27 comparing the response rate of tumors that fall within a tumor subclass with 

28 the overall response rate of tumors or with the response rate of tumors that do not fall 

29 within the subclass; and 

30 identifying the compound or combination of compounds as having selective 

3 1 activity against tumors in the tumor subclass if the response rate of tumors in the 
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1 subclass is greater than the overall response rate or the response rate of tumors that do 

2 not fall within the subclass. 
3 

4 61. The method of claim 60, further comprising the steps of: 

5 providing tumor samples from subjects in need of treatment for tumors; 

6 determining whether the tumors fall within a tumor subclass; and 

7 stratifying the subjects based on the results of the determining step prior to 

8 performing the treating step. 
9 

10 62. The method of claim 60, further comprising the steps of: 

1 1 providing tumor samples from subjects in need of treatment for tumors; 

12 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

13 NO:l in the samples; and 

14 stratifying the subjects based on the results of the detecting step prior to 

1 5 performing the the treating step. » 
16 

17 63. The method of claim 60, further comprising the steps of: 

1 8 providing tumor samples from subjects in need of treatment for tumors; 

1 9 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

20 NO:2 in the samples; and 

21 stratifying the subjects based on the results of the detecting step prior to 

22 performing the treating step. 
23 

24 64. The method of claim 60, further comprising the steps of: 

25 providing tumor samples from subjects in need of treatment for tumors; 

26 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

27 NO:3 in the samples; and 

28 stratifying the subjects based on the results of the detecting step prior to 

29 performing the treating step. 
30 

31 65. The method of claim 60, further comprising the steps of: 

32 providing tumor samples from subjects in need of treatment for tumors; 
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1 detecting expression or activity of at least two genes, wherein each of the 

2 genes encodes a polypeptide whose sequence comprises a sequence selected from the 

3 group consisting of SEQ ID NO: 1 , SEQ ID NO:2, and SEQ ID NO:3 in the samples; 

4 and 

5 stratifying the subjects based on the results of the detecting step prior to 

6 performing the treating step. 
7 

8 80. A method of testing a compound or a combination of compounds for activity 

9 against tumors comprising steps of: 

10 treating subjects in need of treatment for tumors with the compound or 

. 1 1 combination of compounds or with an alternate compound, wherein the tumors fall 

12 within a tumor subclass; 

1 3 comparing the response rate of tumors treated with the compound or 

14 combination of compounds with the response rate of tumors treated with the alternate 

15 compound; and 

16 identifying the compound or combination of compounds as having superior 



17 activity against tumors in the tumor subclass, as compared with the alternate 

1 8 compound, if the response rate of tumors treated with the compound or combination 

1 9 of compounds is greater than the response rate of tumors treated with the alternate 

20 compound. 
21 

22 81. The method of claim 80, further comprising the steps of: 



23 providing tumor samples from subjects in need of treatment for tumors; 

24 determining whether the tumors fall within a tumor subclass; and 

25 stratifying the subjects based on the results of the determining step prior to 

26 performing the treating step. 
27 

28 82. The method of claim 80, further comprising the steps of: 

29 providing tumor samples from subjects in need of treatment for tumors; 

30 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

3 1 NO: 1 in the samples; and 
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1 stratifying the subjects based on the results of the detecting step prior to 

2 performing the treating step. 

3 

4 83 . The method of claim 80, further comprising the steps of: 

5 providing tumor samples from subjects in need of treatment for tumors; 

6 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

7 NO:2 in the samples; and 

8 stratifying the subjects based on the results of the detecting step prior to 

9 performing the treating step. 
10 

11 84. The method of claim 80, further comprising the steps of: 

12 providing tumor samples from subjects in need of treatment for tumors; 

1 3 detecting expression or activity of a gene encoding the polypeptide of SEQ ID 

14 NO:3 in the samples; and 

1 5 stratifying the subjects based on the results of the detecting step prior to 

16 performing the treating step. 
17 

18 85 . The method of claim 80, further comprising the steps of: 

19 providing tumor samples from subjects in need of treatment for tumors; 

20 detecting expression or activity of at least two genes, wherein each of the 

21 genes encodes a polypeptide whose sequence comprises a sequence selected from the 

22 group consisting of SEQ ID NO: 1, SEQ ID NO:2, and SEQ ID NO:3 in the samples; 

23 and 

24 stratifying the subjects based on the results of the detecting step prior to 

25 performing the treating step. 
26 

27 86. The method of any of claims 80, 81, 82, 83, 84, or 85, whereto the alternate 

28 compound is a compound approved by the U.S. Food and Drug administration for 

29 treatment of tumors. 
30 

31 1 00. A method of treating a subject comprising steps of: 

32 identifying a subject as having a tumor in a basal tumor subclass; and 
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1 administering a compound identified according to the method of any of claims 

2 40, 41 , 42, or 45 to the subject. 
3 

4 101. A method of treating a subject comprising steps of: 

5 identifying a subject as having a tumor in a basal tumor subclass; and 

6 administering a compound identified according to the method of any of claims 

7 60, 6 1 , 62, 63, 64, or 65 to the subject. . 
8 

9 1 03. A method of treating a subject comprising steps of: 

1 0 identifying a subject as having a tumor in a basal tumor subclass; and 

1 1 administering a compound identified according to the method of any of claims 

12 80, 81, 82, 83, 84, or 85 to the subject 
13 

14 120. A method of treating a subject comprising steps of: 

1 5 providing a subject in need of treatment for cancfer; 

16 administering to the subject an antibody that specifically binds to a 

17 polypeptide having an amino acid sequence comprising the sequence of SEQ ID 

18 NO:l. 
19 

20 121 . A method of treating a subject comprising steps of: 

2 1 providing a subject in need of treatment for a tumor; 

22 administering to the subject an antibody that specifically binds to a 



23 polypeptide having an amino acid sequence comprising the sequence of SEQ ED 

24 NO:2. 
25 

26 122. A method of treating a subject comprising steps of: 

27 providing a subject in need of treatment for a tumor; 

28 administering to the subject an antibody that specifically binds to a 

29 polypeptide having an amino acid sequence comprising the sequence of SEQ ED 

30 NO:3. 
31 
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1 130. The method of any of claims 120, 121, or 122, wherein the tumor is a breast 

2 tumor, and wherein the method further comprises the step of: 

,3 identifying the tumor as belonging to a basal tumor subclass. 

4 

5 131. The method of any of claims 120, 121, or 122, wherein the antibody is 

6 conjugated with a toxic molecule. 
7 

8 140. A method of treating a subject comprising steps of: 

9 providing a subject in need of treatment for cancer; 

10 administering to the subject a compound that activates or inhibits a gene that 

1 1 encodes an amino acid having a sequence comprising the sequence of SEQ ID NO: 1 , 

12 or that activates or inhibits an expression product of the gene. 
13 

14 141 . A method of treating a subject comprising steps of: 

1 5 providing a subject in need of treatment for a tumor; 

16 administering to the subject a compound that activates or inhibits a gene that 

17 encodes an amino acid having a sequence comprising the sequence of SEQ ED NO:2, 

1 8 or that activates or inhibits an expression product of the gene. 
19 



20 142. A method of treating a subject comprising steps of: 

21 providing a subject in need of treatment for a tumor; 

22 administering to the subject a compound that activates or inhibits a gene that 

23 encodes an amino acid having a sequence comprising the sequence of SEQ ID NO:3, 

24 or that activates or inhibits an expression product of the gene. 
25 

26 150. A composition comprising: 

27 two or more compounds identified according to the method of any of claims 

28 40, 60, or 80. 
29 

30 1 5 1 . A pharmaceutical composition comprising: 

3 1 the composition of claim 1 50; and 

32 a pharmaceutically acceptable carrier. 
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1 

2 160. A composition comprising: 

} a compound identified according to the method of any of claims 40, 60, or 80; 

4 a second compound, wherein the second compound is approved by the U.S. 

5 Food and Drug administration for the treatment of cancer or has shown potential 

6 efficacy against cancer in pre-clinical studies. 
7 

8 1 6 1 . A pharmaceutical composition comprising: 

9 the composition of claim 160;*and 

1 0 a pharmaceutical^ acceptable carrier. 

11 
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FIGURE 1A 



Sequence of cadherin 3 (GenBank accession number NP_00 1784) 



SEQIDNO:! 



MGLPRGPLASLLLLQVCWLQCAASEPCRAVFREAEVTLEAGGAEQEPGQALGK 

VFMGCPGQEPAIJ'STPM)DFIVRNGETVQERRSLKERNPLKIFPSKPJ^ 

WVVAPISVPENGKGPFPQRLNQLKSNKDRDTKIFYSITGPGADSPPEGVFAVEKE 

TGWLLLNKPLDREEIAKYELFGHAVSENGASVEDPMNISITVTDQNDHKPKFTQD 

TFRGSVLEGVLPGTSVMQVTATDEDDAIYTYNGWAYSrHSQEPKDPHDLMFTI 

HRSTGTISVISSGIJDREKVPEYTLTIQATDM^ 

FDPQKYEAHWENAVGHEVQRLWTDLDAPNSPAWRATYLIMGGDDGDHFTITT 

HPESNQGILTTRKGLDFEAKNQHTLYVEVT^^ 

APVFWPSKVVEVQEGIPTGEPVCVYTAEDPDKENQKISYRII^^ 

SGQVTAVGTLDREDEQFVRNNIYEVMVLA1VIDNGSPPTTGTGTLLLTLIDVNDHG 

PVPEPRQITICNQSPVRHVLNITDKDLSPHTSPFQAQLTDDSDIYWTAEVlsIEEGDT 

WI^LKKFLKQDTYDVHI^LSDHGNKEQLTVIRATVCDCHGHVETCPGPWKGG 

FILPVLGAVLALLFLLLVLLLLVRKKRKKEPLLLPEDDTODNVFYYGEEGGGEE 

DQDYDITQLHRGLEAPJPEVVLRNDVAPTIIPTPMYRP 

NTDPTAPPYDTLLVFDYEGSGSDAASLSSLTSSASDQDQDYDYLNEWGSRFKKL 
ADMYGGGEDD 



Sequence of matrix metalloproteinase 14 (GenBank accession number NP_004986) 
SEQIDNO:2 

MSPAPRPPRCLLU>LLTLGTALASLGSAQSSSFSPEAWLQQYGYLPPGDLRTHTQ 

RSPQSI^AAIAAMQKFYGLQVTGKADADTMKAMRRPRCGWDKFGAEIKANVR 

RKRYAIQGIXWQHNEITFCIQNYTPKVGEYATYEAIRKAFRVWESATPLRFREVP 

YAYIREGHEKQADMIFFAEGFHGDSTPFDGEGGFLAHAYFPGPNIGGDTHFDSA 

EPWTVRNEDIMjNDIFLVAVHELGHALGL^^ 

DDDRRGIQQLYGGESGFPTKMPPQPRTTSRPSWDKPKNPTYGPMCDGNFDTVA 
MLRGEMFWKERWFWRVRNNQVMDGYPMPIGQ^ 

WFKGDKHWWDEASLEPGYPKHKELGRGLPTDKIDAALFWMPNGKTYFFRGN 
K YYRFNEELRA VD SE YPKMKV WEG1PESPRGSFMGSDEVFTYFYKGNKYWKFN 
NQKLKVEPGYPKSALRDWMGCPSGGRPDEGTEEETEVnmVDEEGGGAVSAAA 
WLPVLLLLLVLAVGLAVFFFRRHGTPRRLLYCQRSLLDKV 



FIGURE IB 
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FIGURE 1C 

Sequence of cadherin EGF LAG seven-pass G-type receptor 2 (GenBank accession 
number NP_001399) 

SEQEDNO-.3 

MRSPATGVPLPTPPPPLLLLLLLLLPPPLLGDQVGPCRSLGSRGRGSSGACAPMG 

WLCPSSASNLWLYTSRCRDAGTELTGHLVPHHDGLRVWCPESEAHIPLPPAPEG 

CPWSCRLLGIGGHLSPQGKLTLPEEHPCLKAPRLRCQSCKLAQAPGLRAGERSPE 

ESLGGRRKRNTVNTAPQFQPPSYQATVPENQPAGTPVASLRAIDPDEGEAGRLEYT 

N1DALFDSRSNQFFSLDPVTGAVTTAEELDRETKSTHVFRVTAQDHGMPRRSALA 

TLTILVTDTNDHDPVFEQQEYKESLRENLEVGYEVLTVRATDGDAPPNANILYRL 

LEGSGGSPSEVFEIDPRSGVIRTRGPVDREEVESYQLTVEASDQGRDPGPRSTTAA 

VFLSVEDDNDNAPQFSEKRYVVQVREDVTPGAPVLRVTASDRDKGSNAWHYSI 

MSGNARGQFYLDAQTGALDVVSPLDYETTKEYTLRVRAQDGGRPPLSNVSGLV 

TVQVLDINDNAPIFVSTPFQAWLESWLGYLVLHVQAIDADAGDNARLEYRLAG 

VGHDFPFTINNGTGWISVAAELDREEVDFYSFGVEARDHGTPALTASASVSVTVL 

DVNDNNPTFTQPEYTVRLNEDAAVGTSWWSAVDRDAHSVITYQITSGNTRNR 

FSITSQSGGGLVSLALPII)YKLERQYVIAVTASDGTRQD^^ 

VFQSSHYTVTSTVNEDRPAGTTWLISATDEDTGENARnYFMEDSlPQF 

AVTTQAELDYEDQVSYTLAITARDNGIPQKSDTTYLEILVNDVNDNAPQFLRDSY 

QGSVYEDWPFTSVLQISATDRDSGLNGRVFYTFQGGDDGDGDFIVESTSGIVRT 

LRRLDRENVAQYVLRAYAVDKGMPPARTPMEVTVTVLDVNDNPPVFEQDEFDV 

FVEENSPIGLAVARVTATDPDEGTNAQIMYQIVEGNIPEVFQLDIFSGELTALVDL 

DYEDRPEYVLVIQATSAPLVSRATVHVRLL^ 

SFPGGAIGRWAHDPDISDSLTYSFERGNELSL\T.LNASTGEIiXSRALDNNRPLE 

AMSVLVSDGVHSVTAQCALRVTIITDEMLTHSITLRLEDMSPERFLSPLLGLFIQA 

VAATLATPPDHVVVFNVQRDTDAPGGHILNVSLSVGQPPGPGGGPPFLPSEDLQE 

RLYLNRSLLTAISAQRVLPFDDNICLREPCENYMRCVSVLRFDSSAPFIASSSVLFR 

PIHPVGGLRCRCPPGFTGDYCETEVDLCYSRPCGPHGRCRSREGGYTCLCRDGYT 

GEHCEVSARSGRCTPGVCKNGGTCVNLLVGGFKCDCPSGDFEKPYCQVTTRSFP 

AHSFITFRGLRQRFHFTIALSFATKSPJ)GLLLYNGRFNEKHDFVALEVIQEQVQL 

TFSAGESTTTVSPFN^GGVSDGQWHWQLKYYNKPLLGQTGLPQGPSEQKVAVV 

TVDGCI)TGVALRFGSVLGNYSCAAQGTQGGSKKSIJ)LTGPLLLGGWDIJ , ESFP 

VRmQFVGCMRNLQVDSRHroMADFIANNGTWGCPAKKNVCDSNTCHNGGT 

CVNQWDAFSCECTLGFGGKSCAQEMANPQHFLGSSLVAWHGLSLPISQPWYLSL 

MFRTRQADGVLLQAITRGRSTITLQLREGHVMLSVEGTGLQASSLRLEPGRAND 

GDWHHAQLALGASGGPGHAII^FDYGQQRAEGNLGPRLHGLHLSNTTVGGIPGP 

AGGVARGFRGCLQGVRVSDTPEGVNSLDPSHGESINVEQGCSLPDPCDSNPCPA 

NSYCSNDWDSYSCSCDPGYYGDNCTNVCDLNPCEHQSVCIRKPSAPHGYTCEC 

PPNYLGPYCETRIDQPCPRGWWGHPTCGPCNCDVSKGFDPDCNKTSGECHCKEN 

HYRPPGSPTCLLCDCYPTGSLSRVCDPEDGQCPCKPGVIGRQCDRCDNPFAEVTT 

NGCEVNYDSCPRAIEAGIWWPRTRFGIJAAAPCPKGSFGTAVPJICDEHRGWLPP 

MJ^CTSITFSELKGFAERLQRbffiSGIJDSGRSQQIALLLRNATQHTAGWGSDW 

VAYQIATPJLLAHESTQRGFGLSATQDVHFTENLLRVGSALLDTANKRHWELIQQ 
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TEGGTAWLLQHYEAYASALAQNMRHTYLSPFT1VTPNIVISVVRLDKGNFAGAK 

LPRYEALRGEQPPDLETTVILPESVFRETPPVVRPAGPGEAQEPEELARRQRRiBPE 

LSQGEAVASVIIYRTLAGLIJPH^^roPDKRSLRWKRPIINTPWSISVHDDEELLPR 

ALDKPVTVQFRLLETEERTKRICVFWNHS1LVSGTGGWSARGCEVVFRNESHVSC 

QCNHMTSFAVLMDVSRRENGEILPLKTLTYVALGVTLAALLLTFFFLTLLRILRS 

NQHGIRRNLTAALGI^QLWLLGINQADIJFACTVIAILLHFLYLCTFSWALLEAL 

HLYRALTEVRDWTGPMRFmiLGWGWAFITGLAVGIJDPEGYGNPDFCWLSI 

YDTLIWSFAGPVAFAVSMSVFLYILAARASCAAQRQGFEKKGPVSGLQPSFAVLL 

LI^ATWLIALLSVNSbTLLFHYLFATCNCIQGPFIFLSYVVI^KEVRKALKLACSR 

KPSPDPALTTKSTLTSSYNCPSPYADGRLYQPYGDSAGSLHSTSRSGKSQPSY1PF 

LLRBESALNPGQGPPGLGDPGSLFLEGQDQQHDPDTDSDSDLSLEDDQSGSYAST 

HSSDSEEEEEEEEEEAAFPGEQGWDSLLGPGAERLPLHSTPKDGGPGPGKAPWPG 

DFGTTAKESSGNGAPEERLRENGDALSREGSLGPLPGSSAQPHKGILKKKCLPTIS 

EKSSLIJULPLEQCTGSSRGSSASEGSRGGPPPRPPPRQSLQEQLNGVMPIAMSIKA 

GTVDEDSSGSEFLFFNFLH 
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Figure ID 



Peptides for antibodies that bind to cadherin3 (GenBank accession number NP_001784): 
RAVFREAEVTLEAGGAEQE (SEQ ID NO:4) 
QEPALFSTDNDDFTVRN (SEQ TDNO:5) 
QKYEAHVPENAVGHE (SEQ ID NO:6) 

Peptides for antibodies that bind to matrix metalloproteinase 14 (GenBank accession 
number NP_004986): 

AYIREGHEKQADIMIFFAE (SEQ ID NO:7) 
DEASLEPGYPKHIKELGR (SEQ ID NO:8) 
RGSFMGSDEVFTYFYK (SEQ ID NO:9) 

Peptides for antibodies that bind to anti-cadherin EGF LAG seven-pass G-type receptor 2 
(GenBank accession number NP_001399): 

QASSLRLEPGRANDGDWH (SEQ ID NO:10) 

ELKGFAERLQRNESGLDSGR (SEQ ID NO:l 1) 

RSGKSQPSYIPFLLREE (SEQ ID NO: 12) 

Peptides for antibodi6s that bind to anti-cytokeratinl7: 
KKEPVTTRQVR1WEE (SEQ ID NO:13) 
QDGKVISSREQVHQTTR (SEQ ID N0.14) 
SSSKGSSGLGGGSS (SEQ ID NO: 15) 
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FIGURE 2 



Intrinsic Gene Subset 
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Epithelial-Enriched Gene Subset 
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Figure 3 





FIGURE 4A 




FIGURE 4B 




FIGURE 4C 
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Figure 5 A 
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Figure 5B 





FIGURE 6 




FIGURE 7 
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Table 3 



Common Reference Cell Line List 



Name 


Description 


ATCC# or Reference 


MCF7 


breast adenocarcinoma derived cell 
line 


ATCC#HTB-22 


Hs578T 


breast carcinosarcoma derived cell 
line 


ATCC#HTB-126 


NTERA2 


teratoma derived cell line 


ATCC #CRL-1973 


Colo205 


colon tumor derived ceil line 


ATCC #CCL-222 


OVCAR-3 


ovarian tumor derived cell line 


ATCC#HTB-161 


UACC-62 


melanoma derived cell line 


Stinson et al. Anticiancer Res. Jul-Aug; 12(4): 1035- 
53 1992 


MOLT-4 | 


T-cell leukemia derived cell line 


ATCC #CRL-1582 


RPMMB226J 


multiple myeloma derived cell line 


ATCC#CCL-155 


|NB4+ATRA| 


APL-like cell line _J 


Lanotte et al. Blood Mar 1;77(5): 1080-6, 1991 


SW872 ] 


liposarcoma derived cell line 


ATCC#HTB-92 


HepG2 


liver tumor derived cell line i 


ATCC #HB : 8065^ 
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), etc. 
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I ESTS, WEAKLY SIMILAR TO !!!! ALU SUBFAMILY J WARNING ENTRY !!!! fH.SAPIENSl H97778 
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|KIAA0182 AI023801 
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DOWNSTREAM OF A PUTATIVE CPG ISLAND. CONTAINS ESTS AND GSSS AA045658 
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|ESTS, HIGHLY SIMILAR TO STAT4 [M.MUSCULUS] R91570 1 


| UNTITLED R16098 I 


| MATRIX METALLOPROTEINASE 15 (MEMBRANE-INSERTED") AA443300 1 


| ERBB-2 RECEPTOR PROTEIN-TYROSINE KINASE PRECURSOR AA025141 | 
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DERIVED ONCOGENE HOMOLOG) AA443351 
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SWI/SNF RELATED, MATRIX ASSOCIATED, ACTTN DEPENDENT REGULATOR OF CHROMATIN, SUBFAMILY E, 
MEMBER 1 W63613 


|TNF RECEPTOR-ASSOCIATED FACTOR 4 AA598826 I 


1347348 W81186 I 


[FLOTILLIN 2 R73545 | 


TGFBMNDUCED ANTI-APOPTOTIC FACTOR 1 AA446222 I 


|KIAA0130 GENE PRODUCT N76581 | 


ESTS, HIGHLY SIMILAR TO INSULIN-LIKE GROWTH FACTOR BINDING PROTEIN 2 PRECURSOR [H.SAPIENS] 
H79047 


INSULIN-LIKE GROWTH FACTOR BINDING PROTEIN 5 (IGFBP5) AA054451 1 


HUMAN INSULIN-LIKE GROWTH FACTOR BINDING PROTEIN 5 (IGFBP5) MRNA H08560 ! 


HUMAN INSULIN-LIKE GROWTH FACTOR BINDING PROTEIN 5 (IGFBP5) MRNAT52830 1 


PHOSPHOSERINE PHOSPHATASE-LIKE W05628 ] 


122982 R00332 1 


CYTOCHROME C OXIDASE SUBUNIT VIC AA456931 | 


78921 T60482 I 


134783 R31701 | 


FIBRONECTIN 1 R62612 I 


H.SAPIENS MRNA FOR INHIBIN BETA(A) SUBUNIT N27159 1 


CALCIUM/CALMODULIN-DEPENDENT PROTEIN KINASE (CAM KINASE) II GAMMA T96083 1 


839904 AA490059 | 


MEMBRANE FATTY AQD (LIPID) DESATURASE W49667 | 


RIBOSOMAL PROTEIN L26 AA633569 v. \ 
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| HUMAN CHROMOSOME 16 BAC CLONE CIT987SK-A-362G6 N75498 1 


| NON-SPECIFIC CROSS REACTING ANTIGEN AA054073 ! 


1261194 H98215 1 


(503602 AA131299 I 


150114 H16743 ~~ | 


|ESTS, HIGHLY SIMILAR TO CYTOPLASMIC DYNEIN LIGHT CHAIN 1 TH. SAPIENS! AA401429 . 1 


|STEAR0YL-C0A DESATURASE (DELTA-9-DESATURASE) R00707 1 


| LYSOSOMAL-ASSOCIATED MULTISPANNING MEMBRANE PROTEIN-5 AA410265 1 


IACTIN RELATED PROTEIN 2/3 COMPLEX, SUBUNIT 5 (16 KD) W55964 I 


1345056 W72798 | 


1487831 AA045083 I 


ICATHEPSIN K (PYCNQDYSOSTOSIS) R01515 I 


| ESTS, WEAKLY SIMILAR TO MACROPHAGE LECTIN 2 [H.SAPIENS] N53421 1 


| DERMATOPONTIN R48303 I 


TISSUE INHIBITOR OF METALLOPROTEINASE 3 (SORSBY FUNDUS DYSTROPHY, PSEUDOINFLAMMATORY) 
AA445923 


| INTEGRIN, ALPHA 2 (CD49B, ALPHA 2 SUBUNIT OF VLA-2 RECEPTOR) AA069096 i 


[INTEGRIN, ALPHA 2 (CD49B, ALPHA 2 SUBUNU OF VLA-2 RECEPTOR) AA463610 1 


ISER-THR PROTEIN KINASE RELATED TO THE MYOTONIC DYSTROPHY PROTEIN KINASE N35241 1 


|ESTS, WEAKLY SIMILAR TO (DEFLINE NOT AVAILABLE 5262644) TH.SAPIENS1 N91426 ! 


| MICROTUBULE-ASSOCIATED PROTEIN IB AA219045 I 


|259996 N32611 | 


'ESTS, WEAKLY SIMILAR TO !!!!ALU SUBFAMILY SBl WARNING ENTRY !!!! fH.SAPIENSl N21103 1 


141726 R69584 | 


842848 AA486281 j 


C3H-TYPE ZINC FINGER PROTEIN; SIMILAR TO D. MELANOGASTER MUSCLEBLIND B PROTEIN W16832 ! 


NUCLEAR FACTOR I/X (CCAAT-BINDING TRANSCRIPTION FACTOR) AA406269 1 


ESTS, HIGHLY SIMILAR TO LAR-INTERACTING PROTEIN 1A [H.SAPIENS] N52679 1 


FAS (TN FRSF6)-ASS0CIATED VIA DEATH DOMAIN AA430751 \ 


ESTS, WEAKLY SIMILAR TO STRABISMUS [D. MELANOGASTER] T95333 I 


HOMO SAPIENS CLONE 23704 MRNA SEQUENCE N70212 ! 


RAB6, MEMBER RAS ONCOGENE FAMILY H20138 | 


HUMAN MRNA FOR KIAA0280 GENE, PARTIAL CDS AA428746 | 


501731 AA127861 | 


V-MYC AVIAN MYEL0CYTOMATOSIS VIRAL ONCOGENE HOMOLOG 1, LUNG CARCINOMA DERIVED R62862 1 


66864 T64994 I 


PATERNALLY EXPRESSED GENE 3 AA459941 I 


ESTS, WEAKLY SIMILAR TO PLACENTAL RIBONUCLEASE INHIBITOR fH.SAPIENSl N53214 1 


N-MYCAA101678 
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|HOMO SAPIENS CLONE 23698 MRNA SEQUENCE AA680300 1 


|ESTS, MODERATELY SIMILAR TO FAT-SPECIFIC PROTEIN FSP27 fM.MUSCULUSl AA088749 1 


|H0M0 SAPIENS BRAIN MY047 PROTEIN MRNA, COMPLETE CDS T62031 1 


| MESENCHYME HOMEO BOX 1 AA4263H ] 


| HHCPA78 HOMOLOG AA044633 | 


| ENDOTHELIAL KRUPPEL-UKE ZINC FINGER PROTEIN H45711 1 


ICYCLIN-DEPENDENT KINASE 5, REGULATORY SUBUNIT 1 (P35) AA442853 1 


|FB3 MURINE OSTEOSARCOMA VIRAL ONCOGENE HOMOLOG B T62179 1 


[79412 T57691 | 


|DUAL SPECIFICITY PHOSPHATASE 6 AA630374 | 


LAMININ, GAMMA 2 (NICEIN (100KD), KALININ (105KD), BM600 (100KD), HERLTTZ JUNCTIONAL 
EPIDERMOLYSIS BULLOSA)) AA677534 


|MATRIX METALLOPRQTEINASE 14 (MEMBRANE-INSERTED) N33214 I 


| COLLAGEN, TYPE XVII, ALPHA 1 H87536 f 


ICALP0NIN1, BASIC, SMOOTH MUSCLE AA399519 ?s - | 


PLEIOTROPHIN (HEPARIN BINDING GROWTH FACTOR 8, NEURITE GROWTH-PROMOTING FACTOR 1) 
AA001449 


PLEIOTROPHIN (HEPARIN BINDING GROWTH FACTOR 8, NEURITE GROWTH-PROMOTING FACTOR 1) | 
AA001449 


11912786 AI304356 | 


IGELSOUN (AMYLOIDOSIS, FINNISH TYPE) H72027 | 


| BULLOUS PEMPHIGOID ANTIGEN 1 (230/240KD) H44784 1 


SMALL INDUCIBLE CYTOKINE SUBFAMILY D (CYS-X3-CYS), MEMBER 1 (FRACTALKINE, NEUROTACTIN) 
R66139 


| KERATIN 17 aa026642 I 


| KERATIN 17 AA026642 | 


KERATIN 5 (EPIDERMOLYSIS BULLOSA SIMPLEX DOWLING-MEARA/KOBNER/WEBER-COCKAYNE TYPES) 
W72110 


|ESTS, HIGHLY SIMILAR TO KERATIN K5, 58K TYPE II, EPIDERMAL W72110 1 


IESTS, HIGHLY SIMILAR TO PROBABLE ATAXIA-TEUNGIECTASIA GROUP D PROTEIN [H.SAPIENS] AA055486 | 


ICRYSTALLIN, ALPHA B AA504943 j 


CAVEOLIN 2 T89391 ] 


| ANNEXIN I (LIPOCORTIN I) H63077 | 


DYSTROPHIN (MUSCULAR DYSTROPHY, DUCHENNE AND BECKER TYPES), INCLUDES DXS142, DXS164, 
DXS206, DXS230, DXS239, DXS268, DXS269, DXS270, DXS272 AA461118 


DIHYDROPYRIMIDINASE-LIKE 2 AA487674 ~~ | 


272038 N31948 1 


CYSTBNE DIOXYGENASE, TYPE I AA497111 
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Table 7: Epithelial-enriched gene set 

197474 

H52098 , 

786609 

AA478481 ; . 

• FIBROBLAST ACTIVATION PROTEIN, ALPHA 

AA405569 

LARGE FIBROBLAST PROTEOGLYCAN PRECURSOR 

AA056022 

LARGE FIBROBLAST PROTEOGLYCAN PRECURSOR 

AA056022 

CHONDROITIN SULFATE PROTEOGLYCAN CORE PROTEIN 

AA722599 

PLASMINOGEN ACTIVATOR, UROKINASE RECEPTOR 

AA147962 . 

FIBRONECTIN 1 

R62612 

FIBRONECTIN 1 

R62612 

HUMAN ISOLATE JUSO MUC18 GLYCOPROTEIN MRNA (3* VARIANT), COMPLETE CDS 

AA497002 

H.SAPIENS MRNA FOR INHIBIN BETA(A) SUBUNIT 

N27159 

HUMAN MRNA FOR FIBRONECTIN (FN PRECURSOR) 

' N26285 _ [ 

ESTS, MODERATELY SIMILAR TO !!!! ALU SUBFAMILY SQ WARNING ENTRY 111! [H.SAPIENS] 

H77494 

244703 

N52533 

HOMO SAPIENS MRNA FOR NIDOGEN-2 

AA479199 

LIM DOMAIN ONLY 7 

H22826 

TACHYKININ, PRECURSOR 1 (SUBSTANCE K t SUBSTANCE P, NEUROKININ 1, NEUROKININ 2, NEUROMEDIN L, 
NEUROKININ ALPHA, NEUROPEPTIDE K.NEUROPEPTIDE GAMMA) 

AA446659 

INTERLEUKIN 1, BETA 

W47101 

INTERLEUKIN 1, BETA 

AA150507 

RAS-RELATED C3 BOTULINUM TOXIN SUBSTRATE 1 (RHO FAMILY, SMALL GTP BINDING PROTEIN RAC1) 

AA626787 : 

PROTEIN TYROSINE PHOSPHATASE J 

AA644448 

ESTS, WEAKLY SIMILAR TO I!!! ALU SUBFAMILY SB1 WARNING ENTRY 111! [H.SAPIENS] 

N21103 

FAT TUMOR SUPPRESSOR (DROSOPHILA) HOMOLOG 

A159194 

271952 

N35301 

179276 

H50323 

INOSITOL POLYPHOSPHATE-5-PHOSPHATASE, 145KD 
AA521067 
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CHOLINERGIC RECEPTOR, NICOTINIC, EPS1LON POLYPEPTIDE 

R02Q58 ; ( 

ALDO-KETO REDUCTASE FAMILY 1 t MEMBER C1 (DIHYDRODIOL DEHYDROGENASE 1; 20-ALPHA (3-ALPHA)- 
HYDROXYSTEROID DEHYDROGENASE) 

R93124 

TUMOR NECROSIS FACTOR RECEPTOR SUPERFAMILY, MEMBER 6 

AA293571 , 

CYSTATIN A (STEFIN A) 

W72207 - 

347436 

W81192 

ANTI LEU KOP ROTE I N ASE 

AA026192 : 

JAGGED1 (ALAGILLE SYNDROME) 

R70685 

PRION PROTEIN (P27-30) (GREUTZFELD- JAKOB DISEASE, GERSTMANN-STRAUSLER-SCHEINKER SYNDROME, 
FATAL FAMILIAL INSOMNIA) 

AA455969 

ESTS, WEAKLY SIMILAR TO KIAA0639 PROTEIN [H.SAPIENS1 

AA284277 

843045 

AA488420 

ALDEHYDE DEHYDROGENASE 6 
AA455235 

CADHERIN 3, P-CADHERIN (PLACENTAL) 

AA425556 ' 

MDG l/FATTY ACID BINDING PROTEIN 3, MUSCLE AND HEART 

W04872 ; ' 

TROPONIN I, SKELETAL, FAST : 

AA181334 

MATRIX METALLO PROTEINASE 14 (MEMBRANE-INSERTED) 

N33214 

LAMININ, GAMMA 2 (NICEIN (100KD), KALININ (105KD), BM600 (100KD), HERLIT2 JUNCTIONAL EPIDERMOLYSIS 
BULLOSA)) 

AA677534 

ANNEXIN VIII " " ~~ " " 
AA252968 ' 

ESTS, HIGHLY SIMILAR TO PROBABLE ATAXIA-TELANGIECTASIA GROUP D PROTEIN [H.SAPIENS] A 

A055486 ; ; 

KERATIN 17 " ~~ 

AA026642 

KERATIN 17 " " — 

aa026642 

ESTS, HIGHLY SIMILAR TO KERATIN K5, 58K TYPE II, EPIDERMAL [H.SAPIENS] 

AA1 60507 __ 

KERATIN 5 (EPIDERMOLYSIS BULLOSA SIMPLEX DOWLING-MEARA/KOBNER/WEBER-COCKAYNE TYPES) 
W72110 ; 

ESTS, HIGHLY SIMILAR TO KERATIN K5, 58K TYPE II, EPIDERMAL 

W72110 • 

BULLOUS PEMPHIGOID ANTIGEN 1 (230/240KD) ~~ 
H44784 

S100 CALCIUM-BINDING PROTEIN A2 

AA458884 

INTEGRIN, BETA 4 " " ~ 

AA485668 

INTEGRIN, BETA 4 — ~ 

AA076514 
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2255577 

A1679149 

LAMININ, ALPHA 3 (NICEIN (150KD), KALININ (165KD), BM600 (150KD), EPILEGRIN) 

AA001432 - 

COLLAGEN, TYPE XVII, ALPHA 1 

H87536 

BASONUCLltf _ 

R26526 __________ , 

504940 

AA150619 

HUMAN DNA SEQUENCE FROM CLONE 973M2 ON CHROMOSOME 1Q24.3-31.1 CONTAINS PROSTAGLANDIN- 
ENDOPEROXIDE SYNTHASE 2 (PROSTAGLANDIN G/H SYNTHASE AND CYCLOOXYGENASE) GENE, ESTS, STS, 
GSSS 

AA644211 

810904 

AA459285 , 

MYOSINIC 

AA029956 ; _____ 

EPHRIN-B1 " ' 

AA428778 

MATRIX METALLOPROTEINASE 7 (MATRILYSIN, UTERINE) 

AA031513 

294682 

W01603 

INTEGRIN, ALPHA 3 (ANTIGEN CD49C, ALPHA 3 SUBUNIT OF VLA-3 RECEPTOR) 

AA293040 

INTEGRIN, ALPHA 3 (ANTIGEN CD49C, ALPHA 3 SUBUNIT OF VLA-3 RECEPTOR) 

AA424695 

SERUM AMYLOID A1 

H25546 

GM2 GANGLIOSIDE ACTIVATOR PROTEIN 

AA453978 _ 

ESTS, WEAKLY SIMILAR TO TRANSPOSON LRE2 REVERSE TRANSCRIPTASE HOMOLOG [H.SAPIENS1 

W48580 

CARBONIC ANHYDRASE II " 
H23187 

LATENT TRANSFORMING GROWTH FACTOR BETA BINDING PROTEIN 2 
AA424629 

SECRETED FRIZZLE D-R ELATED PROTEIN 1 ~ 

T68892 , 

LECTIN, GALACTOSIDE-BINDING, SOLUBLE, 7 (GALECTIN 7) " " 
W72436 

PLASMINOGEN ACTIVATOR, UROKINASE : 

AA284668 

ENDOTHELIN RECEPTOR TYPE A 

AA452627 

ESTS, HIGHLY SIMILAR TO (DEFLINE NOT AVAILABLE 5231137) [H.SAPIENS] 

W30988 

N-MYC DOWNSTREAM REGULATED ~~ 

AA489261 

EPIDERMAL GROWTH FACTOR RECEPTOR (AVIAN ERYTHROBLASTIC LEUKEMIA VIRAL (V-ERB-B) ONCOGENE 
HOMOLOG) 

AA234783 

359285 " " " ' " 

AA016234 

INTERLEUKIN 4 RECEPTOR 
AA292025 

DIACYLGLYCEROL KINASE, ALPHA (80KD) " 
AA456900 
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770670 

AA476272 , ________ 

FOLATE RECEPTOR 1 (ADULT) 

R24635 

HUMAN MRNA FOR KIAA0300 GENE, PARTIAL CDS 

AA405458 

HUMAN GABA-A RECEPTOR EPSILON SUBUNIT (GABRE) RNA, ALTERNATIVE TRANSCRIPT 

H63934 m 

SMALL INDUCIBLE CYTOKINE SUBFAMILY D (CYS-X3-CYS), MEMBER 1 (FRACTALKINE, NEUROTACTIN) 

R66139 _, _ 

HUMAN DNA SEQUENCE FROM PAC 196E23 ON CHROMOSOME XQ26.1-27.2. CONTAINS THE TAT-SF1 (HIV-1 
TRANSCRIPTIONAL ELONGATION FACTOR TAT COFACTOR TAT-SF1) GENE, THE BRS3 (BOMBESIN RECEPTOR 
SUBTYPE-3 (UTERINE BOMBESIN RECEPTOR, BRS-3) GEN 

AA700322 ; , , . 

HUMAN DNA-BINDING PROTEIN ABP/ZF MRNA, COMPLETE CDS 

W88571 ^ __ 

PHOSPHATIDYLINOSITOL-4-PHOSPHATE 5-KINASE, TYPE I, BETA — , 

R39069 

51406 

H18950 : 

503051 ~ 

AA149250 . _____ 

FATTY ACID BINDING PROTEIN 7, BRAIN 

N46862 

FATTY ACID BINDING PROTEIN 7, BRAIN 

W72051 

MACROPHAGE RECEPTOR WITH COLLAGENOUS STRUCTURE 

AA485867 , 

HOMO SAPIENS MRNA FOR CALPAIN-LIKE PROTEASE CANPX = 

AA457330 _ , 

298662 

N74313 . __ 

FORKHEAD (DROSOPHILA)-L!KE 7 

N22552 _ 

ESTS, WEAKLY SIMILAR TO I!!! ALU SUBFAMILY J WARNING ENTRY I!!! [H.SAPIENS] 

AA459296 ' 

MEGAKARYOCYTE POTENTIATING FACTOR 

AA488406 

PREFERENTIALLY EXPRESSED ANTIGEN OF MELANOMA 

AA598817 , ■ 

EYES ABSENT (DROSOPHILA) HOMOLOG 2 

AA402207 

SYNAPTOGYRIN 1 

AA007632 

PHOSPHOLIPASE C, BETA 4 

H22563 

TRANSCRIPTION FACTOR AP-2 GAMMA (ACTIVATING ENHANCER-BINDING PROTEIN 2 GAMMA) 

AA399334 

KERATIN 4 " " 

AA629189 

BONE MORPHOGENETIC PROTEIN 7 (OSTEOGENIC PROTEIN 1) 

AA029597 

BONE MORPHOGENETIC PROTEIN 7 (OSTEOGENIC PROTEIN 1) 

W73473 

KIAA0626 GENE PRODUCT 
N62737 

HUMAN MRNA FOR KIAA0338 GENE, PARTIAL CDS 
R71689 
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CERULOPLASMIN (FERROXIDASE) 

H86554 , 

ESTS, MODERATELY SIMILAR TO (DEFLINE NOT AVAILABLE 4159884) [H.SAPIENS] 

AA001222 

DESMOCOLLIN 2 

AA074677 

321902 ' 

W37448 

KERATIN 13 

W60057 

KERATIN 13 

W23757 

134011 

R31262 . 

49630 

H29256 

VITAMIN D (1 ,25- DIHYDROXYVITAMIN D3) RECEPTOR 

AA485226 ; 

SYNDECAN 1 

AA074511 

SEMA DOMAIN, IMMUNOGLOBULIN DOMAIN (IG), SHORT BASIC DOMAIN, SECRETED, (SEMAPHORIN) 3F 

AA454570 

PROTEIN TYROSINE PHOSPHATASE, RECEPTOR TYPE, F 

AA598513 

BUTYRATE RESPONSE FACTOR 1 (EGF-RESPONSE FACTOR 1) 

AA424743 ' 

ANTHRACYCLINE RESISTANCE-ASSOCIATED 

AA495766 s ■ 

MEMBRANE COMPONENT, CHROMOSOME 1, SURFACE MARKER 1 (40KD GLYCOPROTEIN, IDENTIFIED BY 
MONOCLONAL ANTIBODY GA733) 

AA454810 

KERATIN 7 " " " 

AA489569 __ 

813520 

AA455591 

HOMO SAPIENS MRNA; CDNA DKFZP586B2022 (FROM CLONE DKFZP586B2022) 

T52325 

HOMO SAPIENS AGRIN PRECURSOR MRNA, PARTIAL CDS 

AA458878 . 

ESTS, WEAKLY SIMILAR TO KIAA031 9 [H.SAPIENS] 

AA136133 ' 

ANTIQUITIN 1 : ~ 

AA101299 

HEXOKINASE 1 : 

AA485272 

HEXOKINASE 1 : 

AA485271 

LADININ 1 ~~~~ " " " 

T97710 

H.SAPIENS MRNA FOR RECEPTOR TYROSINE KINASE EPH (PARTIAL) 

N90246 

144834 ~ " 

R77251 ' 

CREATINE KINASE, MITOCHONDRIAL 1 (UBIQUITOUS) 
AA019482 

364302 ~~ ~ 

AA022462 

176461 " " 

H43515 
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RECEPTOR PROTEIN-TYROSINE KINASE EDDR1 

H41900 

HOMO SAPIENS MRNA FOR INOSITOL 1 ,4,5-TRISPHOSPHATE 3-KINASE ISOENZYME, PARTIAL CDS . 

N46828 

PLEXIN5 ~~ 

AA496565 __ 

810873 "™ ~~ ' " ^ ~ ~ 

AA459197 

504225 

AA131934 . 

SNF2-R ELATED CBP ACTIVATOR PROTEIN : ' 

AA419088 ' ' 

ESTS, WEAKLY SIMILAR TO III! ALU SUBFAMILY J WARNING ENTRY !!!! [H.SAPIENS] 

H97778 , . 

ESTS, WEAKLY SIMILAR TO KIAA0281 [H.SAPIENS] ' ' 

N54395 

85804 " i 

T72068 ■ 

JUNCTION PLAKOGLOBIN ~ ' 

R06417 

CDP-DIACYLGLYCEROL SYNTHASE (PHOSPH ATI DATE CYTIDYLYLTRANSFERASE) 1 

R31562 • 

PROLINE-RICH GLA (G-CARBOXGLUTAMIC ACID) POLYPEPTIDE 2 

AA430552 

HUMAN DNA SEQUENCE FROM PAC 127B20 ON CHROMOSOME 22Q11.2-QTER, CONTAINS GENE FOR GTPASE- 
ACTIVATING PROTEIN SIMILAR TO RHOGAP PROTEIN. RIBOSOMAL PROTEIN L6 PSEUDOGENE, ESTS AND CA 
REPEAT 

AA037410 , 

ESTS, WEAKLY SIMILAR TO LOW-DENSITY LIPOPROTEIN RECEPTOR-RELATED PROTEIN 1 PRECURSOR 
[H.SAPIENS] 

AA489246 

416386 ~~~ 

W86859 

PLACENTAL BIKUNIN (KUNITZ-TYPE SERINE PROTEASE INHIBITOR) ' 

AA031287 , 

SERINE PROTEASE INHIBITOR, KUNITZTYPE, 2 " 
AA459039 

HUMAN PLACENTAL BIKUNIN MRNA COMPLETE CDS ~~ 

AA031287 

810728 H • " 

AA457707 

HOMO SAPIENS MRNA; CDNA DKFZP586F1318 (FROM CLONE DKFZP586F1318) 

T77847 

147447 " " " 

R81173 

365517 ! 
AA009593 

417081 " " " 

W87826 

ESTS, WEAKLY SIMILAR TO (DEFLINE NOT AVAILABLE 4929751) [H.SAPIENS] " 

AA004846 

HOMO SAPIENS MRNA; CDNA DKFZP586J2118 (FROM CLONE DKFZP586J2118) — 

R98407 __ 

297604 
N69835 

297604 "~ : ~ 

N69835 

DNA SEGMENT, SINGLE COPY PROBE LNS-CAI/LNS-CAII (DELETED IN POLYPOSIS : 
H99681 
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ESTROGEN RECEPTOR 1 

AA1 64586 , 

275798 

R93185 ' 

TUMOR PROTEIN D52 

AA459318 

HUMAN D9 SPLICE VARIANT B MRNA, COMPLETE CDS 

AA453832 : 

MAJOR GASTROINTESTINAL TUMOR-ASSOCIATED PROTEIN GA733-2 PRECURSOR 

AA055808 _^ 

KIAA0351 GENE PRODUCT 

AA402863 . 

RAB2, MEMBER RAS ONCOGENE FAMILY-LIKE 

AA401972 

NEBULETTE 

N77806 

ESTS, WEAKLY SIMILAR TO UNKNOWN [H.SAPIENS] 

R01499 

486828 "~ ™ 

AA042878 

486828 

AA042878 

XMP ~ " ~ " " ^ ~~ ' ~ ^ 

T84249 . 

EPITHELIAL MEMBRANE PROTEIN 2 

T88721 

KERATIN 8 

AA598517 s ■ 

44292 ~ . ' 

H06273 

KERATIN 18 

AAQ70385 

KERATIN 18 

AA664179 

CLAUDIN 4 

AA430665 

HCPE-R MRNA FOR CPE-RECEPTOR ~~ " 

AA506754 

HCPE-R MRNA FOR CPE-RECEPTOR 
W74492 | 

HOMO SAPIENS EPITHELIAL-SPECIFIC TRANSCRIPTION FACTOR ESE-1A (ESE-1) MRNA, COMPLETE CDS 

AA433851 

EPITHELIAL-SPECIFIC TRANSCRIPTION FACTOR ESE-1 B (ESE-1) MRNA COMPLETE CDS 

H27938 

SERINE PROTEASE INHIBITOR, KUNITZ TYPE 1 

AA464250 

TRANSFORMING GROWTH FACTOR, BETA 3 

AA040617 

TRANSFORMING GROWTH FACTOR BETA 3 

AA040616 

TRANSFORMING GROWTH FACTOR BETA 3 

AA040616 

LYSOSOMAL-ASSOCIATED MEMBRANE PROTEIN 1 

H29077 

ISLET CELL AUTOANTIGEN 1 (69KD) 
AA491302 

ESTS, MODERATELY SIMILAR TO K02E10.2 [C.ELEGANS] 
T62552 
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82869 

T69270 ■ 

SELENIUM BINDING PROTEIN 1 

T65736 

HOMO SAPIENS MRNA FOR HYPOTHETICAL PROTEIN 

AA487488 

prolactin Receptor 

R63647 . 

321658 

W32933 

321658 " "~ ~ ' ~ " ^ 

W32933 

202658 

H53479 

202658 . 

H53479 

ESTS, MODERATELY SIMILAR TO HI! ALU SUBFAMILY SQ WARNING ENTRY I!!! [H.SAPIENS] 

AA464739 , 

197520 ' "~~ ' " ' ~"~ ~ ~ ~~ " " " ~ ~~ " " ' ' ™ ~ ~~ 

H52110 

KIAA0182 ~~ 

AA037466 

HUMAN MRNA FOR KIAA01 82 GENE, PARTIAL CDS 

H05563 

SOLUTE CARRIER FAMILY 9 (SODIUM/HYDROGEN EXCHANGER), ISOFORM 3 REGULATORY FACTOR 1 

AA425299 

179211 

H50224 . J 

179211 ~" " "~ ^ ^" " 

H50224 

FRUCTOSE-BISPHOSPHATASE 1 

AA699427 

HUMAN ENDOGENOUS RETROVIRUS ENVELOPE REGION MRNA (PL1) 

AA701655 - 

X-BOX BINDING PROTEIN 1 - 

W90128 

HEPATOCYTE NUCLEAR FACTOR 3, ALPHA 

T74639 , 

GATA-BINDING PROTEIN 3 

H72474 ■ 

GATA-BINDING PROTEIN 3 

R31442 

GATA-BINDING PROTEIN 3 

R31441 

GATA-BINDING PROTEIN 3 

AA058828 

ESTROGEN RECEPTOR 1 

AA291702 

ESTROGEN RECEPTOR 1 

AA291749 

ANNEXINXXXI ' — - — 

N76688 

ANNEXIN XXXI 

N76688 __ 

HOMO SAPIENS MRNA; CDNA DKFZP434A091 (FROM CLONE DKFZP434A091) 

AA431988 

CANALICULAR MULTISPECIFIC ORGANIC ANION TRANSPORTER C 
N80617 
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HOMO SAPIENS MRNA FOR NEUROBLASTOMA, COMPLETE CDS 

AA481950 ; 

CELLULAR RETINOIC ACID-BINDING PROTEIN 2 

AAQ36987 1 

CELLULAR RETINOIC ACID-BINDING PROTEIN 2 

AA598508 

CELLULAR RETINOIC ACID-BINDING PROTEIN 2 

AA036986 

HUMAN SECRETORY PROTEIN (P1 .B) MRNA, COMPLETE CDS 

N74131 

MSH (DROSOPHILA) HOMEO BOX HOMOLOG 2 

AA 195636 ' 

HUMAN CHROMOSOME 16 BAC CLONE CIT987SK-254P9 : 
H23265 , 

204483 " " 

H58234 

HUMAN INSULIN-LIKE GROWTH FACTORiBINDING PROTEIN 5 (IGFBP5) MRNA 

T5283Q • ' 

INSULIN-LIKE GROWTH FACTOR BINDING PROTEIN 5 (IGFBP5) " 
AA054451 

HUMAN INSULIN-LIKE GROWTH FACTOR BINDING PROTEIN 5 (IGFBP5) MRNA 

H08560 

HUMAN MRNA FOR KIAA0061 GENE, PARTIAL CDS " 

N33237 

HUMAN MRNA FOR KIAA0143 GENE, PARTIAL CDS " \ 

AA1 12057 

CYSTEINE-RICH PROTEIN 2 " " : 

AA485427 ; , 

PDGF BETA " ~ \ 

T49539 

67654 ~~ " " 

T49539 

RAS HOMOLOG GENE FAMILY, MEMBER B 

H89046 

RAS HOMOLOG GENE FAMILY, MEMBER B * " 

AA495846 

140018 : — — 

R63971 

140018 " " " 

R63971 , 

81475 _ " 

T63511 

CYTOCHROME P450, SUBFAMILY IIJ (ARACHIDONIC ACID EPOXYGENASE) POLYPEPTIDE 2 

H09076 

P53-INDUCED PROTEIN 7 : 

H12189 

HOMO SAPIENS BREAST CANCER PUTATIVE TRANSCRIPTION FACTOR (2ABC1) MRNA. COMPLETE CDS 
AA460802 ■ 

HOMO SAPIENS BREAST CANCER PUTATIVE TRANSCRIPTION FACTOR (ZABC1) MRNA, COMPLETE CDS 

AA782S28 

SULFOTRANSFERASE FAMILY 2B, MEMBER 1 " " ~ 

R73584 

HEREDITARY HEMOCHROMOTOSIS ' 

R07647 [ ^ 

MUCIN 1, TRANSMEMBRANE " " 

AA488073 , 

156053 ~~ " " 

R72491 
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447786 

AA702350 , _ 

415317 

W92160 

IGG FC BINDING PROTEIN 

R52030 ■ 

EPIDIDYMIS^PECIFIC, WHEY-ACIDIC PROTEIN TYPE, FOUR-DISULFIDE CORE AA451904 
SRC KINASE-ASSOCJATED PHOSPHOPROTEIN OF 55 KDA 

R01281 ; 

CARBONIC ANHYDRASE XI 

N52089 ; 

PHOSPHOFRUCTOKINASE, MUSCLE 

AA099169 

HUMAN HEART MRNA FOR HEAT SHOCK PROTEIN 90, PARTIAL CDS 

AA064973 

130843 

R22306 ___•< 

470105 

AA029949 

H2B HISTONE FAMILY, MEMBER Q 

AA010223 ______ 

H2B HISTONE FAMILY, MEMBER Q 

AA456695 __ 

H2A HISTONE FAMILY, MEMBER L 

N50797 

H1 HISTONE FAMILY, MEMBER 2 

T66816 

322461 ' : ; : " 

W15305 

289734 

N62965 

DUAL SPECIFICITY PHOSPHATASE 4 " 

AA444049 

CALCIUM CHANNEL, VOLTAGE-DEPENDENT, ALPHA 2/DELTA SUBUNIT2 

N53512 

ACYL-COEN2YME A DEHYDROGENASE, SHORT/BRANCHED CHAIN 

H96140 " _____ 

CYTOCHROME P450, SUBFAMILY IIB (PHENOBARBITAL-INDUCIBLE), POLYPEPTIDE 6 

H41908 

PROTEASE INHIBITOR 12 (NEUROSERPIN) : ~~ 

AA1 15876 

HUMAN DNA SEQUENCE FROM CLONE 167A19 ON CHROMOSOME 1P32.1-33. CONTAINS THREE GENES FOR 
NOVEL PROTEINS, THE DI01 GENE FOR TYPE I IODOTHYRONINE DEIODINASE (EC 3.8.1.4, TXDI1, ITDI1) AND AN 
HNRNP A3 (HETEROGENOUS NUCLEAR RIBONUCLEOPR 
N74025 

AUTOCRINE MOTILITY FACTOR RECEPTOR : 

AA479243 _ 

CYTOCHROME P450, SUBFAMILY HA (PHENOBARBITAL-INDUCIBLE), POLYPEPTIDE 7 

T73031 

ANGIOTENSIN RECEPTOR 1 

H66116 

ESTS, WEAKLY SIMILAR TO TUMOROUS IMAGINAL DISCS PROTEIN TID56 HOMOLOG [H.SAPIENS] 

T95268 

QUINOID DIHYDROPTERIDINE REDUCTASE : 

R38198 _ __ 

LYMPHOID NUCLEAR PROTEIN RELATED TO AF4 " " ~~ " 

H99588 



10 



WO 02/08765 



PCT/US01/23843 



49/610 



NUCLEOPORIN 88KD 
IAA479888 



307220 
N95180 



HOMO $AP\ENS MRNA; CDNA DKFZP564P0662 (FROM CLONE DKPZP564P0662) 
R27680 



HEPSIN (TRANSMEMBRANE PROTEASE, SERINE 1) 
H62162 



ESTS, HIGHLY SIMILAR TO TRANSCRIPTION ELONGATION FACTOR TFIIS.H [H SAPIENS] 
R09980 



795744 
AA460298 



N-ACETYLTRANSFERASE 1 (ARYLAMINE N-ACETYLTRANSFERASE) 
R91803 



N-ACETYLTRANSFERASE 1 (ARYLAMINE N-ACETYLTRANSFERASE) 
T67128 



503581 
AA131239 



HUMAN BREAST CANCER, ESTROGEN REGULATED LIV-1 PROTEIN (LIV-1) MRNA PARTIAL CDS 
H29407 



N-ACYLSPHINGOSINE AMIDOHYDROU\SE (ACID CERAMIDASE) 
1AA664155 ' 



EPOXIDE HYDROLASE 2, CYTOPLASMIC 
R73525 



B-CELL CLL/LYMPHOMA 2 
W63749 



ESTS, HIGHLY SIMILAR TO (DEFLINE NOT AVAILABLE 492955>) [H SAPIENS1 
T74688 



BASIC HELIX-LOOP-HELIX DOMAIN CONTAINING, CLASS B, 2 
T62084 



FORKHEAD (DROSOPHILA) HOMOLOG 1 (RHABDOMYOSARCOMA) 
AA448277 



ACTIVATED P21CDC42HS KINASE 
AA427891 



HUMAN MRNA FOR KIAA0303 GENE, PARTIAL CDS 
t^A418846 



487929 
AA045481 



ZINC FINGER PROTEIN HOMOLOGOUS TO ZFP103 IN MOUSE 
AA429297 



CELL DIVISION CYCLE 4-LIKE 
AA041499 



ESTS, WEAKLY SIMILAR TO P1 .1 1659 5 
N47593 



ESTS, WEAKLY SIMILAR TO (DEFLINE NOT AVAILABLE 4502327) [H.SAPIENS] 
T72850 



ESTS, MODERATELY SIMILAR TO li!I ALU SUBFAMILY SQ WARNING ENTRY III! [H.SAPIENS] 
R70598 



220376 
H86813 



HOMO SAPIENS MRNA; CDNA DKFZP434H071 (FROM CLONE DKFZP434H071) 
T41078 



ESTS, WEAKLY SIMILAR TO !!!! ALU SUBFAMILY J WARNING ENTRY !!!! [H SAPIENS! 
AA669222 J 



T3 RECEPTOR-ASSOCIATING COFACTOR-1 [HUMAN, FETAL LIVER, MRNA, 2930 NT] 
AA4Q0234 
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416556 

W86987 , 

i 418240 

W90241 : 

KIAA0130 GENE PRODUCT 

N76581 

ERBB-2 RECEPTOR PROTEIN-TYROSINE KINASE PRECURSOR 

AA025141 

STEROIDOGENIC ACUTE REGULATORY PROTEIN RELATED 

AA504710 

ERBB2-POLYA 

X03363 . 

V-ERB-B2 AVIAN ERYTHROBLASTIC LEUKEMIA VIRAL ONCOGENE HOMOLOG 2 
(NEURO/GLIOBLASTOMA DERIVED ONCOGENE HOMOLOG) 
AA025141 

V-ERB-B2 AVIAN ERYTHROBLASTIC LEUKEMIA VIRAL ONCOGENE HOMOLOG 2 
(NEURO/GLIOBU\STOMA DERIVED ONCOGENE HOMOLOG) 

AA443351 ' 

ERBB2 
AA481 939 

GROWTH FACTOR RECEPTOR-BOUND PROTEIN 7 

H53703 

68400 

T57034 ' 

68400 

T57034 

SWI/SNF RELATED, MATRIX ASSOCIATED, ACTIN DEPENDENT REGULATOR OF CHROMATIN, 
SUBFAMILY E, MEMBER 1 

W63613 

" ESTS, WEAKLY SIMILAR TO ENVELOPE PROTEIN [H.SAPIENS] 

W37778 ■ 

271076 

N29918 
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Table 8: Luminal gene subset 

i 



B-CELL CLl/LYMPHOMA 2 

W63749 

ESTS, WEAKLY SIMILAR TO MEMBRANE GLYCOPROTEIN [M.MUSCULUS] 

AA1 59578 

51700 

H22854 

NEBULETTE 

N77806 

HUMAN DNA SEQUENCE FROM CLONE 167A19 ON CHROMOSOME 1P32.1-33. CONTAINS THREE 
GENES FOR NOVEL PROTEINS, THE DI01 GENE FOR TYPE I lODOTHYRONINE DEIODINASE (EC 
3.8.1.4, TXDI1, ITDI1) AND AN HNRNP A3 (HETEROGENOUS NUCLEAR RIBONUCLEOPROTEIN 

N74025 • ; 

PROLACTIN RECEPTOR 

R63647 

202658 

H53479 

202658 

H53479 

609283 

AA167189 

MYOSIN VI 

AA625890 ; 

470216 

AA028987 

N-ACETYLTRANSFERASE 1 (ARYLAMINE N-ACETYLTRANSFERASE) 
R91803 

HOMO SAPIENS MRNA; CDNA DKF2P434A091 (FROM CLONE DKFZP434A091) 

AA431988 

358936 

W92233 ■ 

SEVEN IN ABSENTIA (DROSOPHILA) HOMOLOG 2 

AA029041 

HEPSIN (TRANSMEMBRANE PROTEASE, SERINE 1) 

H62162 

417081 

W87826 • 

470105 : ; 

AA029949 

HUMAN SECRETORY PROTEIN (P1 .B) MRNA, COMPLETE CDS 

N74131 

HEPATOCYTE NUCLEAR FACTOR 3, ALPHA 

T74639 

X-BOX BINDING PROTEIN 1 

W90128 __ ' 

ESTROGEN RECEPTOR 1 

AA291702 

ESTROGEN RECEPTOR 1 

AA291749 

GATA-BINDING PROTEIN 3 

H72474 

GATA-BINDING PROTEIN 3 " " 

R31441 

GATA-BINDING PROTEIN 3 
R31442 
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ANNEXINXXXI 

N76688 . 

HUMAN BREAST CANCER, ESTROGEN REGULATED LIV-1 PROTEIN (LIV-1) MRNA, PARTIAL CDS 

H29407 

346321 

W74079 _____ 

HUMAN CHROMOSOME 16 BAC CLONE CIT987SK-254P9 

H23265 

71863 

T52564 

271989 

N31935 ; 

ESTS, HIGHLY SIMILAR TO INOSITOL POLYPHOSPHATE 4-PHOSPHATASE TYPE ll-ALPHA 
[H.SAPIENS] 

R86721 

179211 

H50224 

179211 

H50224 

MURINE LEUKEMIA VIRAL (BMM) ONCOGENE HOMOLOG 

T87515 ' 

MURINE LEUKEMIA VIRAL (BMI-1) ONCOGENE HOMOLOG 

AA478036 

LUTHERAN BLOOD GROUP (AUBERGER B ANTIGEN INCLUDED) ■ 

H24954 

HOMO SAPIENS (PWD) GENE MRNA, 3 1 END 

N26536 

782547 : 

AA431796 , 

ACYL-COENZYME A DEHYDROGENASE, SHORT/BRANCHED CHAIN 

H96140 . 

CARNITINE PALMITOYLTRANSFERASE II 

N70848 

ALDO-KETO REDUCTASE FAMILY 7, MEMBER A2 (AFLATOXIN ALDEHYDE REDUCTASE) 

T62865 

CYTOCHROME P450, SUBFAMILY IIA (PHENOBARBITAL-INDUCIBLE), POLYPEPTIDE 7 

T73031 , 

ANGIOTENSIN RECEPTOR 1 

H66116 

LYMPHOID NUCLEAR PROTEIN RELATED TO AF4 

H99588 ( . • 

HUMAN MRNA FOR KIAA0303 GENE, PARTIAL CDS 

AA418846 

EPOXIDE HYDROLASE 2, CYTOPLASMIC " ~~ " 

R73525 

DUAL SPECIFICITY PHOSPHATASE 4 
AA444049 
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Table 9: Basal gene subset 1 



DUAL SPECIFICITY PHOSPHATASE 6 
AA63Q374 

LAMININ, GAMMA 2 (NICEIN (100KD), KALININ (105KD), BM600 (100KD), HERLIT2 JUNCTIONAL EPIDERMOLYSIS 
BULLOSA)) 

AA677534 ; • 

MATRIX METALLOPROTEINASE 14 (MEMBRANE-INSERTED) 

N33214 

COLLAGEN, TYPE XVII, ALPHA 1 

H87536 

CALPONIN 1, BASIC, SMOOTH MUSCLE 

AA399519 

PLEIOTROPHIN (HEPARIN BINDING GROWTH FACTOR 8, NEURITE GROWTH-PROMOTING FACTOR 1) 

AA001449 

PLEIOTROPHIN (HEPARIN BINDING GROWTH FACTOR 8, NEURITE GROWTHrPROMOTING FACTOR 1) 

AA0Q1449 

1912786 

AI304356 

GELSOLIN (AMYLOIDOSIS, FINNISH TYPE) 

H72027 

BULLOUS PEMPHIGOID ANTIGEN 1 (230/240KD) 

H44784 , 

SMALL INDUCIBLE CYTOKINE SUBFAMILY D (CYS-X3-CYS), MEMBER 1 (FRACTALKINE, NEUROTACTIN) 

R66139 

KERATIN 17 

aa026642 

KERATIN 17 

AA026642 

KERATIN 5 (EPIDERMOLYSIS BULLOSA SIMPLEX DOWLING-MEARA/KOBNER/WEBER-COCKAYNE TYPES) 

W7211Q 

ESTS, HIGHLY SIMILAR TO KERATIN K5, 58K TYPE II, EPIDERMAL 

W72110 

ESTS, HIGHLY SIMILAR TO PROBABLE ATAXIA-TELANGIECTASIA GROUP D PROTEIN [H.SAPIENS] 

AA055486 

CRYSTALLIN, ALPHA B 

AA504943 

CAVEOLIN 2 

T89391 

ANNEXIN I (LIPOCORTIN I) 

H63077 

DYSTROPHIN (MUSCULAR DYSTROPHY, DUCHENNE AND BECKER TYPES), INCLUDES DXS142, DXS164, DXS206, 
DXS230, DXS239, DXS268, DXS269, DXS270, DXS272 

AA461118 

DIHYDROPYRIMIDINASE-LIKE 2 
AA487674 
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Table 9: Basal gene subset 2 



EPIDERMAL GROWTH FACTOR RECEPTOR (AVIAN ERYTHROBLASTIC LEUKEMIA VIRAL (V-ERB-B) 
ONCOGENE HOMOLOG) 

AA234783 

GR01 ONCOGENE (MELANOMA GROWTH STIMULATING ACTIVITY, ALPHA) 

W42723 ; 1 

PHOSPHOINOSITIDE-3-KINASE, REGULATORY SUBUNIT, POLYPEPTIDE 1 (P85 ALPHA) 

R54050 ■ , 

HUMAN DNA-BINDING PROTEIN ABP/ZF MRNA, COMPLETE CDS 

W88571 

ANTILEUKOPROTEINASE 

AAQ26192 . 

FATTY ACID BINDING PROTEIN 7, BRAIN 

W72051 

CHITINASE 3-LIKE 2 ™ ~ 

AA668821 _ 

TRANSMEMBRANE 4 SUPERFAMILY MEMBER 1 

AA088439 __ 

TRANSMEMBRANE 4 SUPERFAMILY MEMBER 1 

N47476 

HOMO SAPIENS MRNA FOR CALPAIN-LIKE PROTEASE CANPX 

AA457330 

KERATIN 7 

AA489569 . 

LADININ 1 • 

f 97710 [ ; 

CADHERIN 3, P-CADHERIN (PLACENTAL) 

AA425556 

PROTEIN TYROSINE PHOSPHATASE, RECEPTOR TYPE, K 

R79082 

SRY (SEX-DETERMINING REGION Y)-BOX 9 (CAMPOMELIC DYSPLASIA, AUTOSOMAL SEX-REVERSAL) 

AA400739 

KERATIN 13 

W23757 

KERATIN 13 

W60057 

2255577 

AI679149 

INTEGRIN, BETA 4 

AA076514 

TROPONIN I, SKELETAL, FAST 
AA181334 
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Table 10: ErbB2 gene subset 



ERBB-2 RECEPTOR PROTEIN-TYROSINE KINASE PRECURSOR 

AA025141 

ERBB2 

AA481939 

ERBB2-POLYA 

X03363 

V-ERB-B2 AVIAN ERYTHROBLASTIC LEUKEMIA VIRAL ONCOGENE HOMOLOG 2 
(NEURO/GLIOBLASTOMA DERIVED ONCOGENE HOMOLOG) 

AA443351 

V-ERB-B2 AVIAN ERYTHROBLASTIC LEUKEMIA VIRAL ONCOGENE HOMOLOG 2 
(NEURO/GLIOBLASTOMA DERIVED ONCOGENE HOMOLOG) 
AA025141 

GROWTH FACTOR RECEPTOR-BOUND PROTEIN 7 

H53703 , 

STEROIDOGENIC ACUTE REGULATORY PROTEIN RELATED. 

AA504710 

68400 

T57034 

68400 

T57034 

SWI/SNF RELATED, MATRIX ASSOCIATED, ACTIN DEPENDENT REGULATOR OF CHROMATIN, 
SUBFAMILY E, MEMBER 1 

W63613 

TNF RECEPTOR-ASSOCIATED FACTOR 4 

AA598826 

347348 

W81186 

FLOTILLIN 2 

R73545 

TGFB1 -INDUCED ANTI-APOPTOTIC FACTOR 1 

AA446222 
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Table 1 1 : Endothelial Gene Subset 



TISSUE FACTOR PATHWAY INHIBITOR (LI PO PROTEIN-ASSOCIATED COAGULATION INHIBITOR) 

T47454 

ALDEHYDE DEHYDROGENASE 1 , SOLUBLE 

AA664101 ' 

HOMO SAPIENS MRNA FOR KIAA0758 PROTEIN, PARTIAL CDS 

N95226 ; s 

VON WILLEBRAND FACTOR 

AA487787 

PLATELET/EN DOTHELIAL CELL ADHESION MOLECULE (CD31 ANTIGEN) 

R22412 

MANIC FRINGE (DROSOPHILA) HOMOLOG 

H22922 

INTERCELLULAR ADHESION MOLECULE 2 

R21535 

245147 

N76361 

REGULATOR OF G-PROTEIN SIGNALLING 5 

AA668470 

TEK TYROSINE KINASE, ENDOTHELIAL (VENOUS MALFORMATIONS, MULTIPLE CUTANEOUS 
AND MUCOSAL) . 

H02848 

LIM BINDING DOMAIN 2 

H74106 , ' 

KINASE SCAFFOLD PROTEIN GRAVIN 

AA478542 

359722 

AA011182 

TYROSINE KINASE WITH IMMUNOGLOBULIN AND EPIDERMAL GROWTH FACTOR HOMOLOGY 
DOMAINS 

AA432062 

CD34 ANTIGEN 

AA434483 

HUMAN DNA SEQUENCE FROM CLONE 1033B10ON CHROMOSOME 6P21 .2-21.31. CONTAINS 
THE BING5 GENE, EXONS 1 1 TO 15 OF THE BING4 GENE, THE GENE FOR GALT3 (BETA3- 
GALACTOSYLTRANSFERASE), THE RPS18 (40S RIBOSOMAL PROTEIN S18) GENE, THE SACM2 

N78611 

69672 

T53626 

HOMO SAPIENS KDR/FLK-1 PROTEIN MRNA, COMPLETE CDS 
AA026831 
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Table 12: Stromal/Fibroblast Gene Subset 



MUSCULIN (ACTIVATED B-CELL FACTOR-1) 

AA470081 

COLLAGEN, TYPE V, ALPHA 1 

R75635 

471748 

AA035018 

SMOOTH MUSCLE ACTIN, ALPHA2 

AA040169 

TRANSGELIN/SM22 

AA010664 

SMOOTH MUSCLE PROTEIN 22-ALPHA 

AA010664 

LUMICAN 

AA035657 

FIBULIN 1 

AA614680 

COLLAGEN, TYPE VI, ALPHA 3 

R62603 

HOMO SAPIENS OSF-2 MRNA FOR OSTEOBLAST SPECIFIC FACTOR 2 (OSF- 
2P1), COMPLETE CDS 

AA598653 

COLLAGEN, TYPE 111, ALPHA 1 (EHLERS-DANLOS SYNDROME TYPE IV, 
AUTOSOMAL DOMINANT) 

T98612 . 

COLLAGEN, TYPE I, ALPHA 1 

W90360 

COLLAGEN, TYPE I, ALPHA 2 ■— 

AA490172 

COLLAGEN, TYPE III, ALPHA 1 (EHLERS-DANLOS SYNDROME TYPE IV, 
AUTOSOMAL DOMINANT) 

AA044829 

COLLAGEN, TYPE III, ALPHA 1 (EHLERS-DANLOS SYNDROME TYPE IV, 
AUTOSOMAL DOMINANT) 

T98612 

COLLAGEN, TYPE I, ALPHA 2 

W93067 

THY-1 CELL SURFACE ANTIGEN ' 

AA496283 

HOMO SAPIENS, ALPHA- 1 (VI) COLLAGEN ' 

AA046525 

COLLAGEN, TYPE VI, ALPHA 1 

AA047209 

COLLAGEN, TYPE VI, ALPHA 1 

AA047209 

HUMAN ALPHA-2 COLLAGEN TYPE VI MRNA, 3* END 

AA633747 

HUMAN METHIONINE SYNTHASE MRNA, COMPLETE CDS 

AA23365Q 

265694 

N25353 
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Table 13: B-cell gene subset 



IMMUNOGLOBULIN GAMMA 3 (GM MARKER) 

AA663981 

COLONY STIMULATING FACTOR 1 (MACROPHAGE) 

N92646 

NEUTROPHIL CYTOSOLIC FACTOR 1 (47KD, CHRONIC GRANULOMATOUS DISEASE, AUTOSOMAL 1) 

AA489666 [ 

IMMUNOGLOBULIN LAMBDA-LIKE POLYPEPTIDE 2 ^ " " ' ^ ~ 

W73790 

IMMUNOGLOBULIN LAMBDA LIGHT CHAIN ~ ~" -~ 

R50297 

HUMAN REARRANGED IMMUNOGLOBULIN LAMBDA LIGHT CHAIN MRNA 

N64851 . 

HUMAN REARRANGED IMMUNOGLOBULIN LAMBDA LIGHT CHAIN MRNA 

T67053 

HUMAN IG J CHAIN GENE 

H24896 __ 

IMMUNOGLOBULIN J CHAIN 

H24896 

HUMAN IG J CHAIN GENE 

T70057 

MAJOR HISTOCOMPATIBILITY COMPLEX, CLASS II, DQ BETA 1 

R73128 

IMMUNOGLOBULIN MU 

H73590 

EARLY DEVELOPMENT REGULATOR 2 (HOMOLOG OF POLYHOMEOTIC 2) 

AA598840 ' 

MAX-INTERACTING PROTEIN 1 

AI087032 



1 



WO 02/08765 PCT/US01/23843 

59/610 



Table 14: Adipose-enriched/Normal breast gene subset 



MESENCHYME HOMEO BOX 1 

AA426311 

INSULIN-LIKE GROWTH FACTOR 1 (SOMATOMEDIN C) 

AA456321 ; 

CYCLIN-DEPENDENT KINASE INHIBITOR 1C (P57, KIP2) 

R81336 

78946 

T61792 

FATTY ACID BINDING PROTEIN 4, ADIPOCYTE 

AA046090 

FATTY ACID BINDING PROTEIN 4, ADIPOCYTE 

AI652163 

FATTY ACID BINDING PROTEIN 4, ADIPOCYTE 

N92901 

MDGI/FATTY ACID BINDING PROTEIN 3, MUSCLE AND HEART 

AA 128926 

CD36 ANTIGEN (COLLAGEN TYPE I RECEPTOR, THROMBOSPONDIN RECEPTOR) 

R09416 

CD36 ANTIGEN (COLLAGEN TYPE I RECEPTOR, THROMBOSPONDIN RECEPTOR) 

N39161 __ 

GLUTATHIONE PEROXIDASE 3 (PLASMA) 

AA664180 

FOUR AND A HALF LIM DOMAINS 1 

AA456394 

ALCOHOL DEHYDROGENASE 2 (CLASS I), BETA POLYPEPTIDE 

N93428 

AQUAPORIN 7 

H27752 _____ 

484535 

AA036974 

LIPOPROTEIN LIPASE 

AA633835 ______ 

GLYCEROL-3-PHOSPHATE DEHYDROGENASE 1 (SOLUBLE) 

AA192547 

RETINOL-BINDING PROTEIN 4, INTERSTITIAL 

T72220 

INTEGRUM, ALPHA 7 

AA055979 

85660 

T62068 

PHOSPHOLEMMAN 

H57136 __ 

AQUAPORIN 1 (CHANNEL-FORMING INTEGRAL PROTEIN, 28KD) 

H24316 

APOLIPOPROTEIN A-l 

R97710 . 

SMALL INDUCIBLE CYTOKINE SUBFAMILY A (CYS-CYS), MEMBER 14 

R96668 

PEROXISOME PROLIFERATIVE ACTIVATED RECEPTOR, GAMMA 

AA088517 

ENDOTHELIN RECEPTOR TYPE B 
H28710 
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Table 15: Macrophage gene subset 



ESTS, MODERATELY SIMILAR TO !!!! ALU SUBFAMILY SX WARNING ENTRY !!!! [H.SAPIENS] 

T94293 

CHITINASE 1 

T94272 

53341 

R15934 

SMALL INDUCIBLE CYTOKINE SUBFAMILY A (CYS-CYS), MEMBER 18, PULMONARY AND 
ACTIVATION-REGULATED 

AA495985 

FOLYLPOLYGLUTAMATE SYNTHASE 

R44864 , 

LYSOZYME (RENAL AMYLOIDOSIS) 

N63943 , 

LYSOZYME (RENAL AMYLOI DOSIS) 

N63943 

TRANSCRIPTION FACTOR AP-2 ALPHA (ACTIVATING ENHANCER-BINDING PROTEIN 2 
ALPHA) 

N63770 

LIPASE A, LYSOSOMAL ACID, CHOLESTEROL ESTERASE (WOLMAN DISEASE) 

AA6301Q4 

CD68 ANTIGEN 1 

AA421296 . 

ACID PHOSPHATASE 5, TARTRATE RESISTANT 

R08816 . 

FC FRAGMENT OF IGE, HIGH AFFINITY I, RECEPTOR FOR; GAMMA POLYPEPTIDE 

R7917Q 

CATHEPSIN Z " — 

AA488341 
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Table 16: T-cell gene subset 



INTERLEUKIN 10 RECEPTOR, ALPHA 

AA437226 

INTEGRIN, ALPHA L, CD11A 

R48796 ' 

742143 

AA406027 : L 

T-CELL RECEPTOR, BETA CLUSTER 

N91921 

80186 

T64192 

T-CELL RECEPTOR, DELTA (V.D.J.C) 

AA670107 [21 

ESTS, WEAKLY SIMILAR TO S-ACYL FATTY ACID SUNTHETASE THIO ESTER HYDROLASE, MEDIUM 
CHAIN [R.NORVEGICUS] 

AA470066 

LYMPHOCYTE-SPECIFIC PROTEIN TYROSINE KINASE 

AA420981 

CD3D ANTIGEN, DELTA POLYPEPTIDE (TIT3 COMPLEX) 

AA055946 

CD3G ANTIGEN, GAMMA POLYPEPTIDE (TIT3 COMPLEX) 

T66800 

TRANSCRIPTION FACTOR DP-2 (E2F DIMERIZATION PARTNER 2) ' 
AA465444 
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